If you have ever been interested in awk and sed Unix utilities, then you probably know about the awk1line.txt and sed1line.txt files that are floating around the Internet. Each file contains around 80 idiomatic sed and awk one-liners for performing various text modification tasks.
Making my way through them was not easy and I decided to write two articles explaining every one-liner in these files. It took me several months to finish them, but now I am proud that I did it.
Here is the Awk One-Liners Explained article: Part One, Part Two, Part Three, Part Four (bonus).
And here is the Sed One-Liners Explained article: Part One, Part Two, Part Three.
My future plans are to publish a free ebook with all the one-liners. If you are interested, please visit my site after a few months. I’ll publish it there.
My dad is always singing the praises of Sed and Awk but that’s not the only reason I don’t like them
They encourage these one-liners, which just means removing the formatting from code. They use regular expressions, something I consider should be avoided at all costs on account of their impenetrable syntax. My general feeling is that in the time it takes to figure out how to do anything with these tools, you could have just written a Python script to do it.
I’m sure they were great tools in their day but I really think Python trumps them, replacing all their functionality and throwing in maintainability to boot.
All that said, these are excellent, useful and well written articles. If I ever find I’m forced to use these things, I shall be eternally greatful that something like this exists.
Depends on the load you have to lift. If I wanted to perform a quick bucket-sort, then I’d use PHP, VBScript or any other non-compact scripting language. But if I had to make a syntax processor (like I have: http://camendesign.com/code/remarkable ), then there’s no way I’d do it without regex. I’d have to practically reimplement a hard-coded regex engine in the process of handling the byte-by-byte matching for all the use-cases.
Sure the likes of sed and awk are hard to use, but there’s wizards out there who can, and those of us who can’t — it’s not ours to say that one tool is better than the other, when in the right hands.
Reminds me of this T-Shirt – http://www.thinkgeek.com/tshirts-apparel/unisex/frustrations/374d/?… – “Go away or I will replace you with a very small shell script.”
Edited 2009-02-19 13:08 UTC
I’ve never considered awk hard to use. You can learn the basics in about thirty pages. Have you ever sat down and read “Effective GAWK Programming”?
http://www.gnu.org/software/gawk/manual/gawk.html
sed is not hard either. Python is great too, but for little quick jobs awk is great. awk can be just as maintainable if you write it cleanly (and I’ve seen python code that is a mess, because you can write messy Python too.) But for a one liner, who cares if it is maintainable? You use it once then throw it out!
This manual has 360 pages (pdf-version). I would like to learn the basics of awk and has already bought a German cheap reference by O’Reilly.
Do you know a shorter tutorial?
Yep: http://www.grymoire.com/Unix/Awk.html
First off, Python is not available anywhere, but awk and sed are (almost). Maybe perl is better than python, but anyway, that is not the point.
awk and sed scripts are not there to be maintained. They are not developer tools. They are administrator tools and they are there to make it easy for you to edit or search files quickly. sed and awk are used for one-shot commands in 95% of the cases. Once you have the result, there is no need to maintain the command at all.
How many lines do you have to write in python, just to open a file and read it? In sed or awk, that’s 0. The file is open and parsed. There is no way you can make it faster in python.
Edited 2009-02-19 14:22 UTC
One. cont = open(“file.txt”).read().
Or
for line open(“file.txt”): do_stuff_with(line)
If that means significant extra work for you, you need a harder problem domain 😉
That should be:
for line in open(“file.txt”): do_stuff_with(line)
Agreed.
That should be:
It’s neither. It should be:
<pre>
with file(“file.txt”) as fd:
for line in fd:
do_stuff_with(line)
</pre>
Edited 2009-02-20 23:27 UTC
Such pedantic diligence (‘with’ statement) is probably not appropriate in a thread that mentions awk/sed. Using with statement is needed when you are worrying about closing file handles in some-future-version-of-python that may not do reference counting (which could imply that the file handle would remain open until the next garbage collection cycle).
Not really something you need to worry about if you are mainly targeting your normal python installation. Furthermore, a linear script can just close() the filehandle without caring about exceptions (because it would just exit the process and close everything anyway).
Chinese (traditional/simplified) looks pretty incomprehensible to me but evidently a 1,000,000,000+ people manage to get by.
Without a doubt it’s unwelcoming to those who don’t understand it, but for those who do…
That’s why I wrote the articles, to explain them.
It is unfortunate that only the first paragraph shows on hacker news website.
Click on http://www.osnews.com/story/21004/Awk_and_Sed_One-Liners_Explained… to find links to my articles!
no, there is not a billion people who can read Chinese, unlike popular believe, and there is a reason for that. Guess why they had to ‘invent’ simplified on top of traditional? The same applies to all other fancy writing systems.
However, the point is that different tools serve different purposes and you may get along with it without knowing it inside out. I wished I knew any which one of these to some extent, but absent an actual need, I cannot justify putting the required time into it. You will know better…
Are you serious?!
Regular expressions are awesome, and definately a core technology, even when using Python.
You are either a troll or completely clueless. soz.
http://xkcd.com/208/
and of course:
http://xkcd.com/353/
and for what it’s worth; Python > Perl
🙂
Regular expressions are very powerful but they cover a limited range of problems between the the trivial (the title is the example given to trim whitespace) and the very complex.
These old UNIX tools were terific in their day when they were the only way of doing things. But I think the fact that they continue to get so much air time has more to do with the fact that it’s fun play with them. They’re like crossword clues.
Don’t underestimate what you can do with Python. It’s trivial to read in a text file and split it into an array using whatever seperator you want. Modern scripting languages have tremendously powerful string handling functions and they do all this using real words. RE’s are there if you need them but you very rarely do.
My work rate isn’t limited by the speed at which I type, it’s limited by the speed at which I think (that is, severely limited). I think better if I’m not having to translate everything via these arcane hieroglyphics.
I wish people were better at distinguishing between a genuinely held (and valid) opinion and a troll. It is only my opinion.
I think it’s because most people still think that bringing Python into the equation for simple formatting and extracting information from output is overkill. If I’m just writing a simple filter to be used between the output of one program and the input of another program on my own machine then I’m just going to use the shell, sed, and awk. If I want to distribute some kind of application that performs all the functions of my script I’m probably going to want to develop it in a language like Python or Perl.
Depends on how you think (hammer & nails, anyone?). I always think of a regexp solution to any given problem (that I solve in Python) first – typically, I slurp in a string, run re.findall on it, then do a for loop over the resulting tuples.
Regular expressions have the advantage of being insanely fast, and very easy to work with. I agree that for the problems that are trivial enough to solve with awk/sed, regexs may be overkill – you can just do s.replace(), s.split() and s.join().
If you want to convince me python is better than sed/awk for this kind of tasks you should write all the one-liners in python. Then I will listen to you.
I wouldn’t really say that AWK and SED encourage these one liners, rather I would say they enable one liners.
When you work with these tools to the comfort level that you can just spit out these one liners, that’s where this facility becomes a powerful command line tool rather than a just a scripting language.
I don’t think twice about just pounding out long pipelines in the shell, or short scripts. Similarly, if it can fit on one line, I’ll consider doing the same with AWK or SED.
These are one off events that never last beyond perhaps the shell history.
You obviously subscribe to the school of thought that says someone who uses regular expressions to solve a problem now has two problems.
True to a degree – they tend to be overused, or made overly complicated by people who think that because they can read it, so will the next person who comes along. But they’re also an extremely powerful tool, for which there really isn’t any practical alternative – e.g if you want to validate that a string matches a pattern, you can use a simple regex to do it, or you can write your own parser. And writing your own parser is almost never the right answer.
Just liked. It would be nice to have a PDF version with all concatenated.
Thank you.
Yep. I will publish two free ebooks, one on Awk one-liners and one on Sed one-liners.
Ps. you can subscribe to my posts on my blog so that you don’t miss them.
If you use GNOME just File->Print->Print to file.
Excellent.
fretinator ~ $ sed “W+F rU $@y!n6”
Very nice! But I’m going to wait for the PDF file that’ll include all the tutorials (not that I don’t like your blog or anything, hehe).
Great contribution, thank you!
Just finished printing the two text files. Allthough sed and awk do already belong to my usual daily tools, I like to learn something new every day. Many thanks!
If you have to explain one-liners, then they aren’t funny.
It’s been said before but you don’t always have Python or Perl available and knowing Awk and Sed can be a life saver. They are also great tools for formatting output for input into other programs. An entire scripting language is overkill for that kind of work.
Many thanks for this – particularly for the cheat sheet which is very convenient, but the one liners are excellent and well commented too. We can always learn.
The nice thing about awk is, its lightweight, reasonably fast, terse, intuitive once you know and are used to it. What its good for, its great for. I think of it a bit like an old pruning knife with a well sharpened slightly worn blade and a handle polished with use, an Opinel for instance. Its safe, cheap, its sharp, it just fits in the hand, and you hardly think about it any more. And would be very sorry indeed if you ever lost it.