December 13, 2009

Book review:

sed & awk, 2nd Edition, by Dale Dougherty and Arnold Robbins, O’Reilly, 1997

sed & awk Pocket Reference, 2nd Edition, by Arnold Robbins, O’Reilly, 2002

Reviewed by Peter G. Anderson, Pittsford, NY   1 July 2009

Years ago I became the book review editor for IEEE Software Magazine, and the first edition of O-Reilly’s sed & awk was the first assignment I gave myself.  The book is about two Unix “power tools” (actually three: grep features prominently too), and I had become a true Unix aficionado. Learning more deeply for that review was a good assignment for me. It was a great book then, and the revised edition, along with its companion pocket reference still is (the main difference is the Posix standard for these tools, which, along with other variations and releases, is covered nicely, and the typography and layout seems much nicer than before).

The idea underlying the notion of Unix power tools is that so much of what we deal with is ASCII files with some systematic layout, namely our programs and our data.  And because we can get quite proficient with these power tool, we also set up our other files — calendar, address books, recipes, documents — so we can apply the tools to them as well.

The Unix view of a structured file entails hiding all the underlying physical and logical file structure machinery under the hood of the operating system, and letting users deal with lines delimited by new-line characters as records and fields any material delimited by sequences of space and tabs (aka white space). Programs (tools and power tools) are often profitably considered to be filters: they read from standard input (usually text from a file or the keyboard or the output of a another program just before it in a pipeline) then they write to standard output (again, usually text to the screen or another file or to be used as input to yet another program).

Sed and awk are the quintessential tools for constructing such filters. A third tool, grep, belongs in this list.  The book gives a good coverage of grep, but it didn’t make it to the title.

A concept central to these three tools is regular expressions, the technical term for pattern matching.  The book gives a nice treatment of this subject, and it is worth going back over it and thoroughly mastering it. The tool grep does nothing more than match regular expressions in it its input. (The funny name comes from the phrase “globally search for a regular expression and print it” which is expressed in the notation of a few Unix editor as g/re/p.  The “g” and “p” are part of the editors’ notations, and the “re” is short for any regular expression.)  To be specific, the invocation
grep ‘dog’ my_file
will print to the screen (standard output) every line in my_file that contains the substring ‘dog’.  And the invocation
grep ‘d[aeiou]g’ my_file
will print every line that has one or more of the stings
‘dag’, ‘deg’, ‘dig’, ‘dog’, or ‘dug
The repertoire of regular expressions is quite a bit broader than these two examples. Study the book, and keep the pocket reference handy. (The syntax and capability of regular expression matching is a little
inconsistent.  The various versions and releases of these tools have their own variations, which the books take care to list.  And the various shells’ ideas of regular expression for file name matching (aka wild cards) are quite different from that of grep, sed, and awk.)

Sed and awk take the grep idea much further.  They allow you to specify actions on lines that match your regular expressions. Sed (for stream editor) is the simpler of the two.  A typical sed invocation is
sed ‘s/dog/cat’ < my_file > my_new_file
which copies my_file to my_new_file changing ‘dog’ to ‘cat’.

Grep gives you the power to specify real computer program style actions to the lines that match.  The awk invocation
awk ‘/dog/ { x = x + 1; if ( x % 2 == 0 ) { print } }’ my_file
prints every second line that has ‘dog’.

I have given very few details of these three Unix tools here.  They all have lots of options and variation.  A wonderful thing about these Unix tools is that you don’t need to know them in full detail to be able to use them productively.  Read these books for clear, extensive guidance, with all the rules and lots of good, realistic examples.  You can learn at whatever rate you wish, and become more and more productive accordingly.

These power tools, and this pair of books are an excellent way to begin building command line proficiency.  Don’t stop with these though.  The shell (e.g., bash) is another power tool, and there are another couple of hundred tools, little and big, built in with your Unix system (and Linux and Mac’s under the hood BSD Unix) that belong in every competent user’s repertoire.

Hello world!

October 30, 2009

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!