The A-Z of Programming Languages: AWK

Submitted by Ward D 2008-05-27 General Development 12 Comments

AWK is one of the most common UNIX tools to process text-based data in either files or datastreams. Written by Alfred Aho, Peter Weinberger, and Brian Kernighan, AWK “extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions.” ComputerWorld interviewed Alfred Aho.As is usually the case, the programming language grew out of a need. “As a researcher at Bell Labs in the early 1970s, I found myself keeping track of budgets, and keeping track of editorial correspondence,” Aho explains, “I was also teaching at a nearby university at the time, so I had to keep track of student grades as well.” He wanted a simple programming language that could deal with these tasks.

Out of this grew AWK, a language based on the principle of pattern-action processing. It was built to do simple data processing: the ordinary data processing that we routinely did on a day-to-day basis.. We just wanted to have a very simple scripting language that would allow us, and people who weren’t very computer savvy, to be able to write throw-away programs for routine data processing.

The fact that even I could write basic AWK programs is testimony to its ease of use, and despite its age, it still proves its usefulness today. The interview is an interesting read.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

12 Comments

2008-05-27 3:01 pm
thavith_osn
When I first used a Unix box back in 1990 I looked for a language to help me understand what this OS could do. The first thing I stumbled across was AWK. I loved it. I soon discovered a lot of other tools of course, but used AWK quite a bit for my work. Believe it or not, cc wasn’t installed on this box back then (probably came on a tape or floppy, but I didn’t think to look – sadly, besides, it was a machine for a customer), so scripting languages where the only way to go.
Must have another look at it again some day, I notice it is on OS X, I know Linux has it there too…
2008-05-27 3:08 pm
xiaokj
Good reading.
However, what I think of awk is that it is a has-been kept alive by other packages dependant on it. I have yet to hear of new uses of awk. Instead, people use grep when scripting and efficiency is needed, or use perl when needing power.
awk ushered in perl, so should it be considered a stepping stone? I would vote yes, and I sincerely hope that, if such, we should leave it to toil away behind the scenes of history. Developers, simply put, have too many things to juggle with these days, IMHO.

2008-05-27 5:03 pm
whartung
I dunno if it’s a stepping stone or not. I think today most folks step up to Perl simply because more folks know Perl.
But I discovered AWK early, and it’s still my hammer of choice, dropping in to Perl only for true edge cases that AWK simply doesn’t work well.
I find it easier to work with than Perl, and for most of my work, the extra power and utility of Perl simply isn’t necessary. I typically just need to do basic processing, and AWK is perfect for that.
It’s not unheard of for me to pipe a couple of small AWK programs together with other shell utilities. Again, Perl could probably handle the entire task, but a combination of my familiarity with the classic Unix utilities, Perl’s complexity, and AWKs simplicity all conspire to make me use the shell and AWK before I even think of using Perl. Typically, if I need Perl, I need to crack open a book/website. AWK and the shell I have, effectively, memorized.
I strongly suggest AWK to anyone. It’s clever pattern -> action paradigm is quite simple, and really powerful.
And, as Aho said, it’s just great for throw away programs. I have /tmp/x.awk almost permanently living on my disk, constantly being rewritten.
One of my common idioms is to edit a data file (in vi, of course), then :!vi /tmp/x.awk, create/edit a script, then ZZ back to my file, 1G!Gawk -f /tmp/x.awk to run the script on the buffer. If I don’t like the changes, hit u to undo and edit the awk script again. Rinse and repeat.
It’s a great tool.

2008-05-27 3:33 pm
taos
So simple, yet so powerful.
Have you noticed that the program structure of DTrace in Solaris is also awk-like?
From dtrace_usenix.pdf :
” Each probe clause has the form:
probe-descriptions
/predicate/
{
action-statements
}
…
D uses a program structure similar to awk(1). ”
It’s just natural to write a program this way:
Match? Fire! Next.

2008-05-27 5:52 pm
Doc Pain
Oh how I love these articles. =^_^=
So simple, yet so powerful.
I can only agree to that. Even today – why “even”? – awk is one of the tools I use. For example, I just wrote a simple awk script to convert a csv (comma seperated values) list file into a HTML and a LaTeX fragment. No big deal. Why? First of all, awk comes with my OS, I don’t need to install anything (I’m using FreeBSD), no fat dependencies, and a great man page. And if you are familiar with C and know how to formulate regular expressions (as you need them), awk is a fast helper.
There are other great tools, little tools that simply do their job, just to mention a few: sed, grep, cut.
Have you noticed that the program structure of DTrace in Solaris is also awk-like?
Yes! 🙂
It’s just natural to write a program this way:
Match? Fire! Next.
By “match” you can attach actions to regex patterns of other conditions (e. g. line counters).
!/^#/ && (length != 0 || dings > 50) {
_____gsub($1 $2 bla bla bla);
_____pups = sprintf(“zeux %dies %das %jenes”, uhu, kram);
_____printf(“bla”, pups, furz);
_____dings++;
}
# And now for something completely different.

2008-05-27 10:11 pm
whartung
There are other great tools, little tools that simply do their job, just to mention a few: sed, grep, cut.
AWK makes me hate cut.
Why oh why oh why can’t cut compress white space just like AWK does. By default, AWK separates fields based on one or more white space.
1 2 <– There’s supposed to be several spaces here, but HTML eats them
is the same as
1 2
AWK treats the spanning white space as a single delimiter.
But oh no, not cut. Nope. If you use ” ” as a delimiter in cut, you’ll get a field for every single space.
*sigh*
Even today, modern cut can’t do that — even as an option. So, I use AWK.
Just a pet nit…
Edited 2008-05-27 22:15 UTC

2008-05-28 12:23 am
malkia
First, thanks for the nice AWK tips there. I should learn more AWK.
I’m still using only cut, sed, tr, etc…. I’m working mostly on Windows (which I do not like much) but that’s my job – so cygwin on the help.
As for your example for cut and white spaces, here is how it can be solved:
cut –help | sed -r “s/[ ]+/ /g” | cut “-d ” -f 2-
I guess sed can be used to replace two or more spaces to one, and then preprocess…
2008-05-28 1:13 pm
news7os
You can “squeeze” spaces with tr (this might be GNUism, i’m not sure), which is what I do:
echo “blah blah” | tr -s ‘ ‘ | cut -f2 -d’ ‘
Nice article. AWK! AWK!

2008-05-29 8:49 pm
Doc Pain
You can “squeeze” spaces with tr (this might be GNUism, i’m not sure), which is what I do:
echo “blah blah” | tr -s ‘ ‘ | cut -f2 -d’ ‘
Then it would be BSDism, too, because it works in FreeBSD, as I have just checked.
Nice article. AWK! AWK!
Nice sound, at least if Americans and Englishmen pronounce it correctly. 🙂 In Germany, awk is pronounced “ar way kar” letter-wise – shorter than “ay doubleyou kay” of course… “This is Ay Doubleyou Kay Radio 100.4 MHz, you’re listening to Doctor Frasier Crane…” (see Focus Shift n – 1)… 🙂
It’s just sad HTML eats up all our pretty spaces so we can’t demonstrate how nicely it works. It would be great to have <pre>…</pre> enabled here…

2008-05-29 8:43 pm
Doc Pain
AWK makes me hate cut.
As you pointed out correctly, there are cases when cut isn’t the best tool. But that’s tht nature of a tool – use it for what’s it good at, and don’t use it when it creates more problems than simply using another tool.
Cases where cut is a good tool are, where
1. you just want one of n fields,
2. the field delimiter isn’t a space or a tab and
3. when you don’t need to care for multiple spaces or tabs.
I remember a case where all three cases were met: I needed a stupid script that would extract all the nicknames from my X-Chat log files, so I did – and don’t try this at home, kids – the following stupidity:
cat ${LOGFILES} | grep “<” | grep “>” | grep -v “CTCP” | cut -d ‘<‘ -f 2 | cut -d ‘>’ -f 1 | sort | uniq -d | xargs echo > nicklist.txt
After I entered it and saw that it worked, I thought that I’d have better used awk… 🙂

2008-05-31 5:37 am
fernandotcl
Use the “-F” flag to specify the separator in awk.
In my experience, awk can always replace cut with ease, but the opposite isn’t always true. Still, cut is lighter on resources and also more readable (the operation you’re performing seems more explicit to me).
2008-05-31 7:20 pm
Doc Pain
Regarding the field separator:
Use the “-F” flag to specify the separator in awk.
That’s correct. Another option is to set FS in your awk script. The solutions
awk -F “:” ‘{ print $2, $4; }’ zeux.csv
or
awk ‘BEGIN { FS = “:”; } { print $2, $4; }’ zeux.csv
should give the same results. FS can be declared as a regular expression that makes “space or more spaces and / or tab or tabs” the field separator.
In my experience, awk can always replace cut with ease, but the opposite isn’t always true. Still, cut is lighter on resources and also more readable (the operation you’re performing seems more explicit to me).
Following the trend of “light weight solutions” and the philosophy to choose the best tool for each task, cut sometimes simply is the best solution. For example, in situations like
cat /etc/passwd | cut -d “:” -f 1
I wouldn’t use awk.