“In this article a simple usage of regular expressions is described. Its intention is to bring users to try the most powerful search and replace paradigm available and hopefully start using it. This however can not replace good tutorials available on the sites that are also mentioned in this article. The article is written reproducing actual steps I took to complete my task, to show the specifics and possible problems.”
Not bad at all in two pages, and very encouraging to people to try. and his travails are fairly typical.
Struggling with this stuff, I have found the most accessible guide to be The Awk Programming Language, by Aho, Kernighan and Weinberger. Very nice, succinct and easy to follow examples. If you are not a natural, it takes work, but its very worth doing.
Its in fact around 5 pages clipped and pasted into WP. What’s nice is how he does a few things and has them go wrong, and then presents a little systematic account of the structure of regexs. Its something you could give someone who is tiptoeing into it, but finding the basic idea a bit opaque. Like I certainly did at the start. Wish someone had given me this then, would have saved some time which I lost plunging in at the deep end. Until I got to Aho et al.
I like the debian rutebook 🙂
Regular Expressions are something every programming language and every OS should have. I would push for some form of standardisation though. Those w things aren’t standard and many implementations don’t include them. I’d also push for two different kinds of regex. One with DFA and another that supports backreference (hybrid or whatnot) with automatic detection. Other things that would be nice is lazy and greedy qualifiers.
Regular Expressions are something every programming language and every OS should have.
Agreed. For me, it’s something I can never predict when I will need, but every now and then I’m really thankful I know the basics so I can do something quickly.
I would push for some form of standardisation though.
In a sense I agree because little pisses me off more than an editor with a brain-dead regex syntax that stops me from doing things I know should be simple.
On the other hand, as long as an implementation provides a reasonable way to do all the common things (and a decent reference if they’re not intuitive) I don’t think standardization’s that big a deal. It’s like programming: you learn the common concepts once or twice and then you can pick up new languages really quickly.
I don’t think standardization’s that big a deal. It’s like programming: you learn the common concepts once or twice and then you can pick up new languages really quickly.
Nah, the problem with RE aren’t diffrent syntax. Most of the time it’s the missing features that annoys me.
What features are in question? (This is a serious question because I really thought it’s pretty obvious what a RE search is and what isn’t – for programmers, but then average users can’t deal with REs anyway).
Most of the stuff on this page: http://www.regular-expressions.info/refadv.html
REs are nice, but i never could remember these strange constructions =]
http://www.codeproject.com/cpp/notepadre/notepadre.png
Imagine an average user typing this without a visual regexp editor.
Even though it might be helpful, regexps are not something an average user needs to remember or even know about.
Secondly, even though your basic user would know about them, regexp for matching html doc type declaration is probably not amongst the most used searches
I disagree. I reckon almost everybody can benefit from the ability to use regex now and again. On a very low-level example, suppose you made a faux-tabular data layout in your favourite word-processor. [You know the type; just use tabs to align the data in columns.]
You’ve just finished entering the data, then you realise (or your manager decides, as is often the case) the columns should be in a different order. Curses!
Now you can either commit to a hefty and time-consuming bout of line-by-line cutting and pasting, or save the chunk of data into a textfile and run a one-line sed or awk expression over it.
Well, that’s how I got turned on to regex, anyway. I stress, it was a *lot* of data.
Yes, this is very similar to what happened to me. Client has database of several thousand entries, and the supplier wants several hundred pounds to supply an export facility. You find a way to get the entries out into a sort of semi structured text format, but the problem is how to get that into csv. A few lines of awk will do it, or going through a few passes with a regex enabled text editor, but what else comes close in terms of speed and immediate testability?
Another case, a colleague who is heading a collaborative research project gets input from many different independent freelance researchers in all flavors of Word, most of which he can either not read or partly read, and due to formatting, tables and so on, we can’t find any version of Word that will properly read them all. A few passes through with regular expressions, and you look like a miracle worker, and more important, they can get on with it.
Until you have it available, its hard to imagine how useful it is. It is an uphill struggle, but its really worth putting the time and effort in, because the first time you need it, it will all be repaid in a flash. And this is from only a smattering of knowledge of it. Now if you were a real regex guru…
One day!
> Now you can either commit to a hefty and time-consuming
> bout of line-by-line cutting and pasting, or save the
> chunk of data into a textfile and run a one-line sed or
> awk expression over it.
I think you overlooked the term “average user”
Nice article. Can always do a
man perlrequick – quick reference
man perlretut – tutorial
man perlre – the works
for a handy online reference even if you’re not using perl since most apps use pcre. Heck man pcre too.
Been a pro developer for 6 years now and I still struggle with regex syntax (usually have to go back to the docs/google/whatever). But they are extremely powerful in what you can do with them. One of the big debates about regex in GPPL’s is performance. I’m not sure about other languages, but the BCL in .Net has a handy flag you can use w/ your regex instantiations called RegexOptions.Compiled which speeds up performance quite a bit (but of course should be used sparingly for obvious reasons).
Two invaluable resources for .Net devs using regex:
– http://regexadvice.com/
– The Regulator (GUI for building regexes): http://regex.osherove.com/.
Has anyone else tried this?
As part of the ongoing struggle, I bought it, and it works a bit like a tutorial system. Very nice and detailed manual, and what you do is to paste your text or a sample of it into one window, then try out your expression in another window, and see from the highlighting what it picks out.
A lot easier than running the expression and comparing input and output files. It remains a struggle though, but this may be just down to native aptitude.
http://jregexptester.sourceforge.net/
…you now have 2 problems