December 31, 2012 · Linux ·

Awk: Classic use case

I often do database updates based off tab or comma separated files. When a long-running job comes to an end and I need to find out why certain rows failed, I can easily get a list of successful “keys” by running a sql query. The failed rows are in the input file – but how do I quickly spit out just the rows that were not loaded, so I can examine them?

AWK is a powerful command line utility found on Unix-like platforms (or Cygwin for Windows) that is a godsend for text-based processing. Doing a task like this is a one liner for this nifty utility:

awk -F"," 'NR==FNR{a[$1]=$1;next} !a[$1] { print $0} ' sql_keys input.csv

There are some gotchas with awk but once you get familiar with it's idioms, your productivity level gets very high. I love more expressive languages like python, but I a task like this begs a one line solution. At the very least, pop this in a shell script and reuse it whenever you need to — no need to remember the awk syntactical details, just run it when you need it.

Where to look for more ? The coolest awk tips are on this site: AWK Tips and Pitfalls and there's always O'Reilly's "Effective awk Programming"

Another reason to use awk: ain’t nothing as cool as “old school” :)

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket