Wednesday, December 30, 2009

Counting word frequencies

My code segments in my comment on http://blogs.sourceallies.com/2009/12/word-counts-example-in-ruby-and-scala/comment-page-1/#comment-255 were eaten. Here it is




This is a favorite question of mine as well, but I expect the developer to be pragmatic and provide me a scripting solution that can be written and run in less than 5 minutes.

// cat individual files into all.txt
// and do the rest in awk one liner

awk '{for(i=1;i<=NF;i++) c[tolower($1)]++} END {for(w in c) print w, c[w]}'

// output can be sorted in both ways in the same statement as well

... | sort | tee alpha_sorted.txt | sort -k 2 -nr > freq_sorted.txt

No comments: