C/C++ parsers, such as libxml2 or Xerces C++ have entered the scene, and so have their Perl extensions. Perl XML folks have developed Perl SAX, a Perlish counterpart of Java SAX interface. Currently, CPAN contains several parsing modules. This article compares the performance of five free PerlSAX 2 parsers available from CPAN. Older XML::Parser is also included to serve as a baseline.
While the author doesn’t make a conclusion about the performance of the respective parsers, it’s quite clear that XML::SAX::Expat provides the poorest performance and came last in all the tests, often by a large margin.
Stay away from XML::SAX::Expat unless you really need to.
is the real difference between these modules besides speed? Not having used them I don’t exactly know. Maybe there are reasons for using each one or do their features overlap? A programmer needs to take into consideration more than pure speed sometimes, maybe XML::SAX::Expat offers some special features that the others do not.
There was no mention in the article as to whether any of the parsers used were validating parsers. Are they? If they were then this benchmark is total arse, right??
Anyway this kind of test is useless, when running a SAX
based parsing you’re just doing a massive amount of callbacksfrom the parser to the application layer. When you are
crossing language boundaries, the cost of crossing the
layers dominates massively over the cost of parsing proper.
SAX is NOT a good API for getting speed in high level
language for this very reason. This also explain why all
native parsers shows the same performances except the onces
which are really not performing fast.
The test is useless, the guy tested the speed of going from
C to perl and back, the fact that he didn’t noticed that
obvious fact is not a good proof of technical analysis,
the fact that vertical axis doesn’t even give a order
of magnitude for the scale, we can guess it is a time
value but nothing more. He also change the SAX api interface
in his own module, since that’s the #1 speed factor in such
a context it’s obvious one can get better result.
Sounds like a not very pretty way to try to do advertizing
his module, of course he’s biased…
Daniel
I hope I’m not punished too much by plugging YAML whenever an XML article pops up. But more people need to know about it. I’ve practically never used XML anymore these days, but are using YAML in many places (especially config files). YAML feels so natural for scripting languages, it feels almost like you’re just typing a data structure literals in your own language!
http://www.yaml.org/
And Syck (A C-based YAML parser, emitter, loader; with bindings to Perl, Python, Ruby, PHP, and possibly others) is _fast_.
If you’re programming is Perl, speed is low on your list of priorities.
It’s one of the faster interpreted languages out there…
That’s like saying it’s one of the fastest 660cc automobiles. Relatively, that super! Absolutely, it means nothing.
That’s like saying it’s one of the fastest 660cc automobiles. Relatively, that super! Absolutely, it means nothing.
Yes it does, it means a lot, if you want to compare apples to apples, and you want to get the best out of your chosen toolset.
Of course compiled languages are faster than interpreted languages. If you want to get max performance out of systems level programming (or even GUI apps), go with C or C++.
If you want the benefits of an interpreted language (rapid app development, less bugs and memory problems, cross platform capability, etc), choose a language that is fastest or best suits your needs.
From what I’ve read, Perl is one of the, if not the, fastest interpreted languages out there. And it’s not a big memory/cpu cycle hog like the JVM or CLR. And it is super flexible and useful.
If you want the benefits of an interpreted language (rapid app development, less bugs and memory problems, cross platform capability, etc), choose a language that is fastest or best suits your needs.
That doesn’t make much sense. If you can cope with the interpreted language speeds (which can even be faster than typical C++ approach for many apps, mind you), it doesn’t make a big difference which language you choose – they are all roughly the same speed (perl is fastest of the three, ruby is slowest). Choosing perl because it’s slightly faster than, say, Python when Python would have signifcant speed of development / maintenance advantages is probably bad engineering.
Typically Perl is the worst choice among the scripting languages, only rivalled by sh.