This article discusses a small-scale benchmark test run on nine modern computer languages or variants: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, and the four languages supported by Microsoft’s Visual Studio .NET 2003 development environment: Visual Basic, Visual C#, Visual C++, and Visual J#. The benchmark tests arithmetic and trigonometric functions using a variety of data types, and also tests simple file I/O. All tests took place on a Pentium 4-based computer running Windows XP. Update: Delphi version of the benchmark here.
Why benchmark?
Five questions motivated me to design and run these benchmarks. First, I was curious about how the performance of Java 1.4.2 (the latest official version from Sun) compares to that of Microsoft’s relatively new .NET 2003 suite of languages. Both Java and the .NET languages are “semi-compiled” (or, looking at the flip side of the coin, “semi-interpreted”). By this I mean that source code is compiled into intermediate-level code and then run by a combination interpreter/just-in-time compiler. With Java, the intermediate language is called bytecode and the interpreter/compiler is called a Java Virtual Machine (JVM). Source code in the .NET world is compiled into the Microsoft Intermediate Language (MSIL) and is run on the .NET Common Language Runtime (CLR) engine.
The .NET languages benefit from many of the same features that have made Java so popular, including automatic resource management/garbage collection and type safety. They also add interesting new features and conveniences such as cross-language debugging, easy GUI design, and virtually idiot-proof application deployment. But what is the performance penalty of these new features? By adding layers of complexity to its programming model, has Microsoft given up its speed advantage over Java?
Microsoft makes it especially easy to compare the overhead of the Java and .NET frameworks by including J# in the .NET suite. This language is syntactically identical to Java (although it implements only version 1.1.4 of the Java spec, which is by now quite out of date), so any differences in speed between Java and J# should be attributable purely to differences between the Sun and Microsoft runtime overhead.
Second, I wanted to assess Microsoft’s claim that the same routine coded in any of the .NET languages is compiled into identical MSIL code which will ultimately run at the same speed. This led me to keep the benchmark very simple, so that I could make sure the routines in each of the .NET languages really were functionally identical. Would all four languages really run at the same speed?
Third, I was curious to see how much slower Java or the .NET languages are than a fully compiled language like C, especially when the C program is unburdened by the runtime overhead of the CLR. I first tried to eliminate the CLR from the Visual C++ benchmark by turning off the language’s “managed” features with the #pragma unmanaged
directive, but I was surprised to see that this didn’t lead to any performance gains. After that strategy failed, I recompiled the Visual C++ program with Gnu’s gcc C compiler in order to give C every opportunity to shine in its native, unmanaged, CRL-free form.
Fourth, I wanted to find out how semi-compiled languages compare to fully interpreted languages like Python, Perl or PHP. It is often said that as hardware continues to get faster and cheaper we will reach a point where the extra speed of compiled languages will be largely unnecessary. But if there is still an order-of-magnitude difference between the performance of a routine coded in C and the same algorithm coded in Python, we would be wise to keep our C skills up to date. To test this, I wrote another version of the benchmark in Python. I then re-ran the Python benchmark with the Psyco just-in-time compiler to see if we could combine Python’s spectacular readability and rapid development with the speed of a compiled language. Greedy perhaps, but worth a try.
Finally, I thought it would be interesting to see how Sun’s latest Java release compares to earlier versions. Sun has makes strong claims about performance improvements in the 1.4.2 version of its compiler and JVM relative to the earlier 1.3.1 release, and I wanted to see if the performance lived up to the hype. So I added Java 1.3.1 to the benchmark roster.
Designing good, helpful benchmarks is fiendishly difficult. This fact led me to keep the scope of this benchmark quite limited. I tested only math operations (32-bit integer arithmetic, 64-bit integer arithmetic, 64-bit floating point arithmetic, and 64-bit trigonometry), and file I/O with sequential access. The tests were not comprehensive by any stretch of the imagination; I didn’t test string manipulation, graphics, object creation and management (for object oriented languages), complex data structures, network access, database access, or any of the countless other things that go on in any non-trivial program. But I did test some basic building blocks that form the foundation of many programs, and these tests should give a rough idea of how efficiently various languages can perform some of their most fundamental operations.
Here’s what happens in each part of the benchmark:
32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders):
1 – 1 + 2 * 3 / 4 – 5 + 6 * 7 / 8 – … – 999,999,997 + 999,999,998 * 999,999,999 / 1,000,000,000
64-bit integer math: same algorithm as above, but use a 64-bit integer loop counter and operands. Start at ten billion and end at eleven billion so the compiler doesn’t knock the data types down to 32-bit.
64-bit floating point math: same as for 64-bit integer math, but use a 64-bit floating point loop counter and operands. Don’t discard remainders.
64-bit floating point trigonometry: using a 64-bit floating point loop counter, calculate sine, cosine, tangent, logarithm (base 10) and square root of all values from one to ten million. I chose 64-bit values for all languages because some languages required them, but if a compiler was able to convert the values to 32 bits, I let it go ahead and perform that optimization.
I/O: Write one million 80-character lines to a text file, then read the lines back into memory.
At the end of each benchmark component I printed a value that was generated by the code. This was to ensure that compilers didn’t completely optimize away portions of the benchmarks after seeing that the code was not actually used for anything (a phenomenon I discovered when early versions of the benchmark returned bafflingly optimistic results in Java 1.4.2 and Visual C++). But I wanted to let the compilers optimize as much as possible while still ensuring that every line of code ran. The optimization settings I settled on were as follows:
Java 1.3.1: compiled with javac -g:none -O
to exclude debugging information and turn on optimization, ran with java -hotspot
to activate the just-in-time compiler within the JVM.
Java 1.4.2: compiled with javac -g:none
to exclude debugging information, ran with java -server
to use the slower-starting but faster-running server configuration of the JVM.
C: compiled with gcc -march=pentium4 -msse2 -mfpmath=sse -O3 -s -mno-cygwin
to optimize for my CPU, enable SSE2 extensions for as many math operations as possible, and link to Windows libraries instead of Cygwin libraries.
Python with and without Psyco: no optimization used. The python -O
interpreter flag optimizes Python for fast loading rather than fast performance, so was not used.
Visual Basic: used “release” configuration, turned on “optimized,” turned off “integer overflow checks” within Visual Studio.
Visual C#: used “release” configuration, turned on “optimize code” within Visual Studio.
Visual C++: used “release” configuration, turned on “whole program optimization,” set “optimization” to “maximize speed,” turned on “global optimizations,” turned on “enable intrinsic functions,” set “favor size or speed” to “favor fast code,” set “omit frame pointers” to “yes,” set “optimize for processor” to “Pentium 4 and above,” set “buffer security check” to “no,” set “enable enhanced instruction set” to “SIMD2,” and set “optimize for Windows98” to “no” within Visual Studio.
Visual J#: used “release” configuration, turned on “optimize code,” turned off “generate debugging information” within Visual Studio.
All benchmark code can be found at my website. The Java benchmarks were created with the Eclipse IDE, but were compiled and run from the command line. I used identical source code for the Java 1.3.1, Java 1.4.2, and Visual J# benchmarks. The Visual C++ and gcc C benchmarks used nearly identical source code. The C program was written with TextPad, compiled using gcc within the Cygwin bash shell emulation layer for Windows, and run from the Windows command line after quitting Cygwin. I programmed the Python benchmark with TextPad and ran it from the command line. Adding Psyco’s just-in-time compilation to Python was simple: I downloaded Psyco from Sourceforge and added import psyco
and psyco.full()
to the top of the Python source code. The four Microsoft benchmarks were programmed and compiled within Microsoft Visual Studio .NET 2003, though I ran each program’s .exe
file from the command line.
It should be noted that the Java log()
function computes natural logarithms (using e as a base), whereas the other languages compute logarithms using base 10. I only discovered this after running the benchmarks, and I assume it had little or no effect on the results, but it does seem strange that Java has no built-in base 10 log function.
Before running each set of benchmarks I defragged the hard disk, rebooted, and shut down unnecessary background services. I ran each benchmark at least three times and used the best score from each component, assuming that slower scores were the result of unrelated background processes getting in the way of the CPU and/or hard disk. Start-up time for each benchmark was not included in the performance results. The benchmarks were run on the following hardware:
Type: Dell Latitude C640 Notebook
CPU: Pentium 4-M 2GHz
RAM: 768MB
Hard Disk: IBM Travelstar 20GB/4500RPM
Video: Radeon Mobility 7500/32MB
OS: Windows XP Pro SP 1
File System: NTFS
Results
Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph’s scale and render the other results illegible. All scores are given in seconds; lower is better.
int math |
long math |
double math |
trig |
I/O |
TOTAL |
|
Visual C++ | 9.6 | 18.8 | 6.4 | 3.5 | 10.5 | 48.8 |
Visual C# | 9.7 | 23.9 | 17.7 | 4.1 | 9.9 | 65.3 |
gcc C | 9.8 | 28.8 | 9.5 | 14.9 | 10.0 | 73.0 |
Visual Basic | 9.8 | 23.7 | 17.7 | 4.1 | 30.7 | 85.9 |
Visual J# | 9.6 | 23.9 | 17.5 | 4.2 | 35.1 | 90.4 |
Java 1.3.1 | 14.5 | 29.6 | 19.0 | 22.1 | 12.3 | 97.6 |
Java 1.4.2 | 9.3 | 20.2 | 6.5 | 57.1 | 10.1 | 103.1 |
Python/Psyco | 29.7 | 615.4 | 100.4 | 13.1 | 10.5 | 769.1 |
Python | 322.4 | 891.9 | 405.7 | 47.1 | 11.9 | 1679.0 |
Click the thumbnail or here for a full-sized graph of the results
Analysis
Let’s review the results by returning to the five questions that motivated these benchmarks. First, Java (at least, in the 1.4.2 version) performed very well on most benchmark components when compared to the .NET 2003 languages. If we exclude the trigonometry component, Java performed virtually identically to Visual C++, the fastest of Microsoft’s languages. Unfortunately, the trigonometry performance of Java 1.4.2 can only be described as dismal. It was bafflingly bad–worse even than fully interpreted Python! This was especially puzzling given the much faster trigonometry performance of Java 1.3.1, and suggests that there may be more efficient ways to code the benchmark in Java. Perhaps someone with more experience with 1.4.2 can suggest a higher-speed workaround.
Java performed especially well (when discounting the strange trigonometry performance) compared to Microsoft’s syntactically equivalent Visual J#. This discrepancy may be due to the additional overhead of the CLR engine (as compared to the overhead of the JVM), or may have something to do with Visual J# implementing only version 1.1.4 of the Java spec.
Second, Microsoft’s claim that all four .NET 2003 languages compile into identical MSIL code seemed mostly true for the math routines. The integer math component produced virtually identical scores in all four languages. The long math, double math, and trig scores were identical in Visual C#, Visual Basic, and Visual J#, but the C++ compiler somehow produced impressively faster code for these benchmark components. Perhaps C++ is able to make better use of the Pentium 4’s SSE2 SIMD extensions for arithmetic and trigonometry, but this is pure speculation on my part. The I/O scores fell into two clusters, with Visual Basic and Visual J# apparently using much less efficient I/O routines than Visual C# or Visual C++. This is a clear case where functionally identical source code does not compile into identical MSIL code.
Third, Java 1.4.2 performed as well as or better than the fully compiled gcc C benchmark, after discounting the odd trigonometry performance. I found this to be the most surprising result of these tests, since it only seems logical that running bytecode within a JVM would introduce some sort of performance penalty relative to native machine code. But for reasons unclear to me, this seems not to be true for these tests.
Fourth, fully interpreted Python was, as expected, much slower than any of the fully compiled or semi-compiled languages–sometimes by a factor of over 60. It should be noted that Python’s I/O performance was in the same league as the fastest languages in this group, and was faster than Visual Basic and Visual J#. The Psyco compiler worked wonders with Python, reducing the time required for the math and trig components to between 10% and 70% of that required for Python without Psyco. This was an astonishing increase, especially considering how easy it is to include Psyco in a Python project.
Fifth, Java 1.4.2 was much faster than Java 1.3.1 in the arithmetic components, but as already mentioned, it lagged way behind the older version on the trigonometry component. Again, I can’t help but think that there may be a different, more efficient way to call trigonometric functions in 1.4.2. Another possibility is that 1.4.2 may be trading accuracy for speed relative to 1.3.1, with new routines that are slower but more correct.
What lessons can we take away from all of this? I was surprised to see the four .NET 2003 languages clustered so closely on many of the benchmark components, and I was astonished to see how well Java 1.4.2 did (discounting the trigonometry score). It would be foolish to offer blanket recommendations about which languages to use in which situations, but it seems clear that performance is no longer a compelling reason to choose C over Java (or perhaps even over Visual J#, Visual C#, or Visual Basic)–especially given the extreme advantages in readability, maintainability, and speed of development that those languages have over C. Even if C did still enjoy its traditional performance advantage, there are very few cases (I’m hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I would even argue that that for very complex systems that are designed to be in use for many years, maintainability ought to trump all other considerations (but that’s an issue to take up in another article).
Expanding the Benchmark
The most obvious way to make this benchmark more useful is to expand it beyond basic arithmetic, trigonometry, and file I/O. I could also extend the range of languages or variants tested. For example, testing Visual Basic 6 (the last of the pre-.NET versions of VB) would give us an idea how much (if any) of a performance hit the CLR adds to VB. There are other JVMs available to be tested, including the open-source Kaffe and the JVM included with IBM’s SDK (which seems to be stuck at version 1.3 of the Java spec). BEA has an interesting JVM called JRockit which promises performance improvements in certain situations, but unfortunately only works on Windows. GNU’s gcj front-end to gcc allows Java source code to be compiled all the way to executable machine code, but I don’t know how compatible or complete the package is. There are a number of other C compilers available that could be tested (including the highly regarded Intel C compiler), as well as a host of other popular interpreted languages like Perl, PHP, or Ruby. So there’s plenty of room for further investigation.
I am by no means an expert in benchmarking; I launched this project largely as a learning experience and welcome suggestions on how to improve these benchmarks. Just remember the limited ambitions of my tests: I am not trying to test all aspects of a system–just a small subset of the fundamental operations on which all programs are built.
About the author:
Christopher W. Cowell-Shah works in Palo Alto as a consultant for the Accenture Technology Labs (the research & development wing of Accenture). He has an A.B. in computer science from Harvard and a Ph.D. in philosophy from Berkeley. Chris is especially interested in issues in artificial intelligence, human/computer interaction and security. His website is www.cowell-shah.com.
There is an updated version, not maintained by dough, of doughs computer language shootout.
Relaxen und watchen die numbers
http://dada.perl.it/shootout/craps.html
http://dada.perl.it/shootout/craps2craps.html
a rather wellknown bechmarking paper is the Kernighan and Van Wyk micro-benchmarks ( http://www.ccs.neu.edu/home/will/Twobit/KVW/kvwbenchmarks.html ).
it helped create The Great Computer Language Shootout ( http://www.bagley.org/~doug/shootout ).
bengt
I may be able to help you a touch on understanding why the JDK beats C++ in some cases. It comes down to one of two issues:
(1) Object allocation.
Object allocatio nis highly tuned in java because the goal is to encourage solid object oriented programming which means a LOT of object creation. C allocation of memory and C++ object allocztion tend to be abyssmal. (Although to be fair, MSFT’s C on MSFT”s OS is about the best at it I’ve seen, coming cloe to Java peformance for some tests.)
(2) Optimization
Quite simply there is more information available at run-tiem on exactly how code is going to be used then you have through static analysis at compile time. Hotspot gets its name from the fact that it sports a very sophisticated profiler built into the run-time that analyzes actual code use and comes up with best case optimizations. Some of these optimizations would be impossible at compile time because, although they are based on likely code behavior, they could be dangerous if an optimizer guessed wrong. Being a run-time optimizer however Hotspot can pursue these optimizations and then back them out if it see the critical condition occur.
A key example is what we call “agressive inlining”. Hotspot will in-line any method call for which tehre is only one possible target from the current call site.
This is done by tracking what the currently loaded class hirearchy is. If the hirearchy changes (through late binding) then those in-lines are backed out.
This means that Java can effectively get rid of all v-table
calls except those where the call site actually is poly-morphic.
I expected plenty of comments and criticism, and you folks didn’t disappoint! I do appreciate all of the suggestions on how to improve the benchmark. I’d like to respond to the questions or criticisms that either arose most often or that seem most significant. Some of the comments point to real methodological flaws, while others seem to come from a lack of understanding about what I was trying to achieve (which is probably my fault for not being clearer). Many of the complaints could have been avoided if I had included more detailed explanations of my testing procedures and their justifications, but I didn’t want the article to get too long or too dry. So in no particular order, here we go…
Why didn’t you include my favorite language? It’s [faster|newer|cooler] than any of the languages you picked.
I had to limit the number of languages somehow, so I put together what I hoped was a representative selection of the major languages in use today. Also, I had limited time and limited skills. Sure I could have added Perl or Fortran or whatever, but then I would have had to learn enough Perl or Fortran to code the benchmark. Before starting this project, the only language I knew well enough to code even this simple benchmark in was Java/J#. Besides, if anyone is really interested in seeing how AppleScript, Standard ML of New Jersey, or Visual C# running on Mono compare, I invite you to adapt my code and run it yourself. Porting over the benchmark should be trivial if you already know the language, and I’d love to see more results (particularly if you use Lisp, Forth, Perl, or Ruby).
Why not use other C compilers? There are a ton of them out there.
See above.
Why didn’t you test on AMD hardware, or on a Solaris box?
The only machine I had ready access to was my trusty P4 laptop.
The GCC C code is going to run faster in a POSIX environment, linked to glibc instead of Windows libraries. Why didn’t you run it on Linux?
Lack of time, lack of space on my hard drive to dual-boot even a minimal Linux distro. I did run the gcc code within Cygwin, linked to the Cygwin libraries (I assume Cygwin uses glibc, but don’t know for sure). I didn’t post those results since they were nearly identical to the results of the gcc code linked to Windows libraries, but in retrospect I should have included them in my report.
You didn’t really test a fully interpreted language. Python gets compiled down to bytecode by the Python interpreter, so it doesn’t count. Why not include Perl or PHP?
Good point. I didn’t realize that any compilation was going on at all with Python until I read about it here. So yes, it would be instructive to see Perl results (assuming it really is fully interpreted–there seems to be some debate here on that point). But I don’t know Perl and am trying my best never to learn it.
All .NET languages should perform the same. Why did you benchmark all four of them?
Because I wanted to see if Microsoft is telling the truth when they say that functionally identical code gets compiled into identical MSIL code. It turns out that, for the most part, it does.
You can’t be a serious .NET programmer if you don’t even know how to start an unmanaged Visual C++ project!
You’re right. I’m not. But now I know how to do it, thanks. I considered using Visual C++ 6 instead, but ultimately decided to just stick with whatever languages Microsoft’s throwing their weight behind now, and that’s the .NET languages.
It’s unfair to test Java 1.4.2 with the -server flag, but Java 1.3.1 with the -client flag. Everyone knows that the -server version of the JVM runs bytecode faster than the -client version (at the expense of slightly longer startup time).
I was astonished to see that the JVM included with the 1.3.1 SDK doesn’t have a -server version. The only flag available for setting the JVM version is -hotspot, which is the default JVM for 1.3.1 anyway. Install a 1.3.1 SDK, type “java -help” and see for yourself. Maybe they had the -server option in earlier versions of 1.3.1–I used 1.3.1_09.
Why is it surprising to see Java perform well? The bytecode is compiled by the JVM before (or as) it runs, after all.
It’s surprising only because everyone thinks Java is slow. This is probably because early versions of Java really were slow, but I think we’re now witnessing a case of perception lagging behind reality.
Java 1.4.2 is slow on the trig component because it’s using the StrictMath package, which trades speed for accuracy.
Well, maybe. I called the Math package, which (as stated in the Javadoc) may or may not in turn call the StrictMath package. So I don’t really know what’s going on behind the scenes. I did randomly compare results out to eight decimal places or so and got the same trig results for all languages.
You’re not being fair to VB–you’re using its native I/O functions instead of using the standard CLR I/O classes.
You’re probably right, but what I did was… hang on. This requires a detour for a second. I’ll come back to this after the next comment.
You said the only language you knew before writing these benchmarks was Java. Then what right do you have to call these real benchmarks? There are probably all sorts of optimizations that you didn’t know about and didn’t use–real programmers understand their languages better and know how to squeeze performance out of them. No one codes production-quality code after spending a single afternoon learning a language!
I beg to differ. For better or worse, tons of people code exactly like that. In my industry (IT consulting), virtually everyone does! It’s absolutely routine to be given a programming task, a language, and an unrealistic deadline. You’re expected to learn what you can from on-line help, whatever tutorials you can scrounge up on the net, and O’Reilly books, and cobble together code that works. In an ideal world we’d have loads of time to optimize everything and profile the heck out of code before releasing it, but on an actual project that’s very rare. At least, that’s been my experience. So I treated these benchmarks the same way: pick up a book, learn how to do what you need to do, spend a little time making sure you’re being smart about stuff (use buffered I/O streams in Java, for example), but don’t expect it to be 100% optimized. Then move on to the next language. My results won’t duplicate results derived from laboratory conditions, but they should be close to real world (or at least, IT consulting world) results. This is a point I should have made much, much clearer in the article, and I’m sorry for the confusion I caused by not being making it more explicit.
You never answered the VB I/O question!
Right. I learned how to do VB I/O by using the built-in help in Visual Studio .NET 2003. I did what it told me to do. If it told me to use native VB I/O classes, that’s what I did. If I had spent a lot more time poking around I might have been able to figure out how to use more efficient CLR classes, but that route was non-obvious and I had no way of knowing whether its code would have been faster without actually trying it. Again: I was trying to replicate real-world, time-constrained, scenarios with programmers who know the basics but are by no means experts. Having said all that, I appreciate the advice about speeding up VB I/O. Some day I may re-code with that change in mind.
continued in next post…
…continued from previous post
These results are not indicative of anything. Real programs do more than just math and I/O. What about string manipulation? What about object creation? etc.
The short answer: of course you’re right. But most programs do some math and some I/O, so these results will be at least somewhat relevant to virtually any program. Besides, I made liberal use of the phrase “limited scope” and even titled the article “Math and File I/O” so no one could claim false advertising! The longer answer is more interesting, but probably also more controversial. I think it’s fair to say that there are two camps when it comes to benchmarking: the “big, full-scale application benchmark” camp and the “tiny building block benchmark” camp. The arguments used by members of each camp go like this. Big is more accurate in that it tests more of the language and tests complex interactions between the various parts of the language. That’s why only large applications like the J2EE Pet Store (later copied by Microsoft to demonstrate .NET performance) are helpful. But wait, says the other camp. Small is more accurate because it tests common components that all programs share. Big is useless because it covers performance for your program, not mine. Mine may use very different parts of the language than yours, hence show very different results. Performance results gleaned from a database-heavy application like Amazon’s on-line catalogue can tell us nothing about what language to use when coding a CPU-intensive Seti@Home client. No no, the big camp retorts, small is useless because it doesn?t really do much, and what it does do reduces to near-identical calls to the OS or basic CPU operations. Small doesn?t let differences between various languages show through, because the aspects that are unique to each language are not tested. My own take on the issue is this: all of these points are true, and they suggest that the only worthwhile benchmarking is lots of different benchmarks, written on different scales, testing different things. Throw together enough different sorts of benchmarks and you?ll end up with something useful. The benchmark I presented here falls within the “small benchmark” camp simply because small benchmarks are a whole lot quicker and easier to write than big benchmarks. But I’ve presented just one (or two, if you split up math and I/O) benchmark. These results are not useless by any means, but they become a whole lot more useful when they are combined with other benchmarks with different scopes, testing different aspects of languages (such as string manipulation, object creation, collections, graphics, and a gazillion others). And while my project can certainly be criticized for being ?too small,? keep in mind that different languages do produce different results under this benchmark, so it is showing some sort of difference between the languages. In other words, I don?t think it?s too small to be at least a little helpful.
The compile time required for JIT compilers (like a JVM) approaches zero when it’s amortized over the time that a typical server app (for example) runs. Shouldn’t you exclude it from your test?
Good point; I hadn’t thought of that. Next time I will probably exclude it by calling each function once before starting the timer.
Java should perform about the same as C++, and an unmanaged C program should perform better than a managed .NET program. Why run benchmarks when we all know how they’ll turn out?
Because theory isn’t always borne out in reality.
The sorting criteria (using the total instead of a geometric average) is unusual, and favors languages that optimize slow operations.
I did not know about the geometric mean technique, but am very interested in hearing more about it. I had no idea how best to weight the various components of the benchmark, so figured the easiest thing to do was to weight them equally and just add them all up. Some may complain that since the trig component is relatively small, it should be given less weight in the final tally. But I would respond that it?s not small for all languages. The trig component for Java 1.4.2 is longer than all of that language’s other components combined. But the real answer to the problem of sorting and analyzing the results is simple: if people want to massage the raw data differently (maybe you never use trig in your programs, so want to exclude the trig results entirely), go for it! And be sure to tell us what you come up with.
You should use more than 3 runs, and you should provide the mean and median of all scores.
I actually did more like 15 to 20 runs of each benchmark, with at least 3 under tightly controlled conditions. I was a little surprised to find that there were virtually no differences in results regardless of how many other processes were running or how many resources were free. I guess all the other processes were running as very low priority threads and didn’t interfere much. I deliberately included only the best scores rather than the median because I didn’t want results skewed by a virus scanner firing off in the background, or some Windows file system cache getting dumped to disk just as the I/O component started. I figured the best case scenario was most useful and most fair.
Why didn’t you use a high-speed math package for Python, such as numpy or numeric?
I didn’t know about numpy or numeric. I probably should have used a high-speed math package, assuming it would be something that a new Python programmer could find out about easily and learn quickly.
Shouldn’t stuff like this be peer reviewed before being posted?
This ain’t Nature or Communications of the ACM–I figure the 100+ comments I received do constitute a peer review! Nevertheless, I like your idea of a two-part submission, with methodological critique after part 1 and results presented in part 2. I’ll remember that for next time.
Your compile-time optimization is inconsistent. E.g., why omit frame pointers with Visual C++ but not gcc C?
Because Visual Studio had an easy checkbox to turn on that optimization, whereas the 3 minutes I spent scanning the gcc man page revealed -O3 but not -fomit-frame-pointers. Similarly, I compiled Java withy -g:none to turn strip debugging code but didn’t mess with memory settings for the JVM. Someone who programs professionally in C/C++ (or knows more about Java than I do) could have hand-tuned the optimizations more successfully, I’m sure.
Your C++ program is really just C! What gives?
I don?t know C++. I taught myself just enough C (from an O?Reilly book) to code the benchmark. So yes, the C++ benchmark is running pure C code. From my rudimentary knowledge of C vs. C++, I assumed that there were no important extensions to C that would produce significantly different performance over straight C for low-level operations like this, so I stuck to straight C. I called it a ?Visual C++? benchmark because it was compiled by Microsoft?s C++ compiler. And if C++ really is a superset of C (please correct me if that?s not the case?I could be very wrong), then a C program is also a C++ program.
Your trig results are meaningless because you don’t check the accuracy of the results. You could be trading accuracy for speed.
Mea culpa–I did sample the trig test results to compare accuracy across languages; they’re all equally accurate (at least, to 8 decimal places or so). I forgot to explain that in the article.
Again, thanks for all of the comments. I?ve learned a lot from your suggestions and future benchmarks I may run will certainly benefit from the collective experience of all of the posters.
— Chris Cowell-Shah
Python is in these tests because it has to look up the variable each time.
For example: i = i + 1. In Python i is a word that is stored in a hash that points to an pointer. In the C code it’s just a memory location on the stack. Of course, the slow down does affect real life but maybe not as much as it affects this benchmark.
The Long test is unfair for Python because Python has true big number support (based on how much memory you have). In C and Java long is around 64 bits only. In those languages there are special libraries for really large numbers. Apple to oranges type situation.
Python does OK in the trig test because all those functions are implemented in C. It still suffers from the variable name look up problems though.
“I don?t know C++. I taught myself just enough C (from an O?Reilly book) to code the benchmark. So yes, the C++ benchmark is running pure C code. From my rudimentary knowledge of C vs. C++, I assumed that there were no important extensions to C that would produce significantly different performance over straight C for low-level operations like this, so I stuck to straight C. I called it a ?Visual C++? benchmark because it was compiled by Microsoft?s C++ compiler. And if C++ really is a superset of C (please correct me if that?s not the case?I could be very wrong), then a C program is also a C++ program.”
Yes, but that is a poor excuse. The C# benchmark uses a class, and the C++ class would have been pretty much the same, bar tha main function being external to the class.
You need to look at the “iostreams” C++ standard library header (you may find it as “iostreams.h” under some environments.) Look at the cout instance variable, and note that it completely replaces printf, as iostreams replaces stdio.
for example:
printf(“my value %d”, d);
vs.
cout << “my value ” << d;
There is obviously some different code going on here, and there *will* be a different result, if only fractional.
There will be other things you could have done too.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dn…
snip —
You may not disclose the results of any benchmark test of the .NET Framework component of the OS Components to any third party without Microsoft’s prior written approval. Microsoft retains all right, title and interest in and to the OS Components. All rights not expressly granted are reserved by Microsoft.
snip —
Sick but true…
When doing numerical work and you want a list of _modern_ computer languages, then the list is incomplete without Fortran95. Available on a wide range of platforms, Fortran is still very widely used in scientific and technical applications, and Fortran2003 will be fully object-oriented.
If I had time I’d be happy to try these benchmarks on a few modern F90 compilers, but I do think they are a bit naive. Just one example: memory utilisation is an important factor in many problems: you need a few tests using large arrays or matrices.
> Last of all, I’d like to draw the author’s attention to the
> .Net framework EULAs… It is in fact a violation of the
> EULA to produce benchmarks of this sort of .Net against
> other platforms. Which is why they haven’t been done all
> over the place by now
Such a requirement would be illegal in Europe, and more so in the US, it’s against the right to free speach.
EULA’s are not the law, thus this benchmark is not illegal. On the other hand, it’s very lilely that that clause in the EULA is against the law, and as such invalid.
Sorting would be interesting too.
DG
Why isn’t the Intel C compiler benchmarked along with the others? I’m sure it would give the best result.
Hello,
Although I see where you are coming from with your comments, I am not quite sure they are correct.
Your benchmark is designed to measure intensive Math and IO. Thus, it is designed to be useful for a programer who (gasp) is doing intensive Math and IO.
In such a program, the programmer would normally take the initative to make his program as fast as possible. If he saw that the IO in his VB program was 3x as slow as the raw IO speed achieved by C, he very likely would have profiled his program. Running the VB program in CLR profiler it is pretty clear what is up.
Your argument that: `Again: I was trying to replicate real-world, time-constrained, scenarios with programmers who know the basics but are by no means experts. Having said all that, I appreciate the advice about speeding up VB I/O. Some day I may re-code with that change in mind. ‘ is pretty much invalid then. If you want to simulate the performance for newbie programmers in each language, than that is what you should title your article.
Remember, the VS.net tutorials are not designed for writing high performance apps. They are designed to get you off the ground when designing something.
I suppose this might be the right place to mention that C/C++ has libraries available like Blitz++ that can make scientific and mathematical operations faster than in Fortran.
#include “time.h”
#include <iostream>
using namespace std;
template <class T> T arithmetic(const T min, const T max, double& rtime);
int main()
{
const int int_min = 1;
const int int_max = 1000000000; // 1B
const double double_min = 10000000000.0; // 10B
const double double_max = 11000000000.0; // 11B
const long long ll_min = 10000000000; // 10B
const long long ll_max = 11000000000; // 11B
cout << “start c++ benchmark” << endl;
double time;
arithmetic(int_min, int_max, time);
cout << “int arithmetic elapsed time: ” << time << ” ms. min=” << int_min
<< “,max=” << int_max << endl;
arithmetic(double_min, double_max, time);
cout << “double arithmetic elapsed time: ” << time << ” ms. min=” << double_min
<< “,max=” << double_max << endl;
arithmetic(ll_min, ll_max, time);
cout << “long long arithmetic elapsed time: ” << time << ” ms. min=” << ll_min
<< “,max=” << ll_max << endl;
cout << “stop c++ benchmark” << endl;
return 0;
}
class auto_clock
{
clock_t t0;
double& r;
public:
auto_clock(double& rtime)
: r(rtime)
{
t0 = clock();
if (t0 < 0) throw;
}
~auto_clock()
{
clock_t t1 = clock();
if (t1 < 0) throw;
r = (t1 – t0) * 1000.0 / CLOCKS_PER_SEC;
}
};
template <class T> T arithmetic(const T min, const T max, double& rtime)
{
auto_clock ac(rtime);
T r = 1;
T i = min;
while (i < max) {
r -= i++;
r += i++;
r *= i++;
r /= i++;
}
return r;
}
I ran this “benchmarks” on my Linux,PIII/600Mhz,320MB Ram
gcc 3.3.2 versus java 1.4.2(sun). The results are nearly the same gcc versus java
int gcc 21.0s : java 21.9s)
double 23.6s : 33s
long 71s : 69.9s,
trig 14s : 393s !!!!!!
I/O 2.5s : 40.7s !!!!!
and what about memory allocation ( creating/deleting) objects !!!!
someone mentioned this ….
Java actually knows more about how the code is used, which in theory should let it reach better performance than c/c++
bla bla bla ….
this is incredible … :-))))
“someone mentioned this ….
Java actually knows more about how the code is used, which in theory should let it reach better performance than c/c++
bla bla bla …. ”
This is crap…..The JITer “knows” only about the current method. That method is the only thing that can be “enhanced”. On the other hand when it’s compiled the C/C++ compiler “has access” to the whole thing and can do a better job on “tweaking/enhancing” the speed…
“what about memory allocation ( creating/deleting) objects !!!! ”
On this category the winner is CLR.
I’ve sent the auther a perl version that i hacked up during lunch break. I hope he’ll post the results along with the others. I “ported” it from the C version, keeping the authors original style of writing (all math in perl/python is FP anyway)
I once did a test of perl vs. python vs. C (as ref. language) by using a small is_prime function, and then setting it off on a given range of primes. I didnt keep the results, but if i remember correct, C would finish quite fast, perl would take it’s sweet time, and python was just dog slow (all tests without optimisation, and no precompilation for intepreted languages). I never did a Java port of it (target was a unix platform, and java is not my first choise there), but it would be interesting to see any comparisons. The algorithm is (C syntax ) :
int is_prime(unsigned long x)
{
if(x <= 3)
{
if(x == 1)
return(0); // 1 isnt a “real” prime
else
return(1);
}
else if ( x % 2 == 0)
return(0);
else
{
long ctr = 3;
while(1)
{
if((x % ctr) == 0)
return(0);
else if ((ctr * ctr) > x)
return(1);
ctr+=2;
}
}
}
Why did VB do so bad in IO when they are all .Net langugues? I mean they were pretty equal up until the IO part. Any chances of getting the code published for each?
It’s not. C++ just looks like C because it was felt that it would make it more familiar to programmers. Originally Stroustrup’s work focused on adding syntactic sugar to support OO programming with C. At this point it was a strict superset, and Stroustrup used ordinary C compilers along with a pre-processor.
Fast forward a few years and the language is much more complicated and has acquired the name “C++” despite objections. It has also escaped AT&T and is now the “next big thing”. Unfortunately standardisation, and compiler quality still leave a lot to be desired.
An unfortunate side effect, especially outside Unix, was that people took C++ to be “C, only better” and began to insist that good C programs be re-written as bad C++ programs. Since C was in fact standardised, portable and had a working ABI, while C++ had not (some would say still has not today) these things, the cost was uncountable.
There’s a section in Stroustrup’s book “The C++ language” which confirms that in fact C++ is not a superset and was not intended to be. Since ISO C9X is actually newer than the C++ split, it has features which aren’t present in C++ at all. Some programs are valid C and valid C++ but mean different things in each language, a recipe for disaster.
I missed benchmarking of ‘real-life’ actions like instantiation, method-calls with and without params, string-operations, regexp, hashtables…
(didn’t read the whole article though)
[ copied from my slashdot posting on this article. ]
We weren’t quite ready to release it, but we’ve been working on a language performance comparison test of our own. It is available at:
http://scutigena.sourceforge.net/
It’s designed as a framework that ought to run cross-platform, so you can run it yourself. We haven’t added it yet, but I think we really want to divide the tests into two categories. “Get it done” – and each language implements it the best way for that language, and “Basic features comparison” – where each language has to show off features like lists, hash tables, how fast function calls are, and so forth.
It’s an ongoing project, so new participants are welcome! I would appreciate it if comments went to the appropriate SF mailing lists instead of here, so that I can better keep track of them.
I have benchmarked fast string search. Results here : http://www.arstdesign.com/articles/fastsearch.html
The VB IO component of the benchmark uses the backwards compatibility routines to do IO — routines which were never intended for performance. The correct way to IO in VB is to use the StreamReader and StreamWriter classes in the .NET frameworks.
your comments about not having time to learn a language is fine if that is the quality of code that yout customers of Accenture want. However when you take upon yourself the idea to publish a BENCHMARK you have the right to do a standard job and this is totally suboptimal work. If I were your boss and had assigned you a benchmark and told you to publish it with Accenture’s name on it. I would have told you that you DID NOT MEET expectations.
Having spent a number of years in Silicon Valley after a few years in what was then Big 6 consulting, your document fails for BOTH communities.
Memory footprint of the benchmark on different languages should be interesting too!
Google -> The Great Programming Language shootout… for more details pertaining to system functions and tests…
Function calls
Method calls
Recursion
Recursion (no tail optimisation)
Recursion with several locals (no tail optimisation)
Recursion with large local array (no tail optimisation)
Deeply nested (different) function calls
Heap allocate
Heap re-allocate
Heap de-allocate
Heap fetch, store, move
Heap allocate with random de-allocations for small sizes
Heap allocate with random de-allocations for large sizes
Heap allocate with random de-allocations for mixed sizes
Heap allocate and copy
Local fetch, store, increment
Global fetch, store, increment
Member fetch, store, increment (with or without getters/setters)
Array fetch, store, increment
Array fetch, store, increment (2 dimensional)
Pointer fetch, store, increment
Pointer-to-pointer fetch, store, increment
Loop and compare (while)
Static numbered loop (for)
Graphic composition
Bitmap flip
Bitmap blit
Bitmap flip while window moving
Linked list generation
Linked list sort
Linked list search
Linked list random insert
Linked list random insert (with automatic sort)
Linked list random deconstruct
Linked list destroy
XML parse
XML tree sequential traverse
XML tree random search
Socket send packets throughput over controlled LAN
Socket recieve throughput over controlled LAN
Socket two-way throughput over controlled LAN
Socket response/latency over controlled LAN
Socket response/latency under heavy load over controlled LAN
Signal/event response idle
Signal/event response in busy loop
Signal/event response under heavy memory access
Signal/event response under heavy I/O
Signal/event response under heavy calculation
Thread spawn latency
Fetch, store, move, copy, increment mutexed memory with many threads sharing
(All the random stuff should be pre-generated so that it’s the same every test).
Important Note: Pentium trig functions are not IEEE compliant. GCC’s trig functions are. If you need accuracy you cannot unse the built in intel trig functions. The MS compilers in this test used the built in trig functions. You can di this with GCC; however, you have to specify it in the code. I seriously doubt this was done.
<p>
Bottom line: The trig functions in this test are not computing the same thing. The MS results will give 5-6 digits of precision. GCC’s will be correct to 12+. *HUGE* difference.
It’s not clear from the text that you used the compile options to remove integer overflow checks and enable optimizations which are already done by default in C#. This can account for (sometimes) significant differences in the compiled code.
I ran the benchmark with both Sun 1.4.2 and IBM 1.4.1. The IBM VM used 38.2 secs, while Sun used 121.5 secs, a considerable difference (yes, I used the server VM).
In particular, IBM was 8 times faster with the trigonometic computations, and almost twice as fast with longs.
“Such a requirement would be illegal in Europe, and more so in the US, it’s against the right to free speach.”
No it isn’t. I don’t know how many times I have seen people make statements like this without really knowing what “free speech” is all about… But it is likely that those requirements would be unenforcable in most countries (for other reasons).
The gcc results look so bad because there is something
wrong with the math libraries in MinGW and cygwin.
Being extremely surprised by the fact that the trig
run times with gcc are almost 4 times longer than
with .NET, I redid the trig test with each operation
tested individually. Here the results on a dual boot
2 GHz P4 laptop (WinXP and SuSE Linux 9) using
-O3 -ffast-math -march=pentium4 -mfpmath=sse -msse2
as optimization options in both cases:
WinXP and the cygwin version of GCC 3.3.1:
sin: 1.03 seconds
cos: 1.02 seconds
tan: 10.33 seconds
log: 1.92 seconds
sqrt: 0.20 seconds
all 5 in the same loop: 14.36 seconds
WinXP and MinGW: results essentially identical
SuSE 9 and GCC 3.3.1:
sin: 1.02 seconds
cos: 0.99 seconds
tan: 1.16 seconds
log: 0.57 seconds
sqrt: 0.21 seconds
all 5 in the same loop: 3.59 seconds
Clearly, there is something wrong with the tan and
log functions on cygwin and MinGW.
So, the whole test on Linux:
integer arithmetic: 9.6
long integer: 24.5
double: 8.4
trig: 3.6
I/O: 1
total: 47.1
Someone was interested in the Intel compiler results,
here they are:
integer: 9.0
long integer: 39.9
double: 7.0
trig: 4.4
I/O: 1.1
total: 61.4
=> if you have to use 64 bit integers in your
program, don’t use the Intel compiler.
You’d better learn VB.NET better before issuing such kind of benchmarks…
Thumbs down… 🙁
A nit about the tests:
The java benchmark is using the Reader/Writer classes rather than their InputStream/OutputStream counterparts. Reader and Writer are performing unicode conversions, which is apparently having a major impact. Switching to the InputStream/OutputStream classes with the 1.4.2 jvm (-server option) on linux gave the IO portion of the benchmark a 24% speed boost.
I would also recommend that for this benchmark, rather than calling FileWriter.write(yourString), you do something more akin to:
byte[] b = yourString.getBytes();
(loop) {
yourFileOutputStream.write(b);
…
}
I believe that such a change would make the java IO portion of the benchmark more fair.
– Marty
I ran the benchmark with jdk 1.5 alpha compiler
Box is an Athlon XP 1700, Windows 2k 512M memory
Times:
int math 9796
double math 14406
long math 19735
trig 53890
i/o 6266
total benchmark 104094
Time in milliseconds, and I ran with -server option compiled with debug info off
Christopher raised the question of why Java only provides a method to calculate the natural log and not one to calculate the log base 10. The only reason I can think of is that calculating log base 10 from the natural log is easily done using a routine of the form:
public double log10 (double number) {
return Math.log(number) / Math.log(10);
}
This routine is based on the standard mathematical formula log base a (x) = log base b (x) / log base b (a). It is applied in the following form here: log(x) = ln(x) / ln(10).
While this doesn’t justify not putting it in there, it’s possible that the minimalistic but complete provision of the natural log method is what was desired.
Just my $0.02.
hi
i convert the code to delphi:
Code :
program Benchmark;
{$APPTYPE CONSOLE}
uses
SysUtils,
MMSystem,
Math;
var
startTime :Longint;
stopTime :Longint;
elapsedTime :Longint;
intMax :integer;
doubleMin ouble;
doubleMax ouble;
longMin :Int64;
longMax :Int64;
trigMax ouble;
ioMax :integer;
intArithmeticTime: double;
doubleArithmeticTime ouble;
longCountTime ouble;
trigTime ouble;
ioTime ouble;
totalTime ouble;
function intArithmetic(intMax:integer):Longint;
var
intResult :integer;
i :integer;
begin
startTime := timeGetTime;
intResult := 1;
i := 1;
while (i < intMax) do
begin
intResult := intResult – i;
inc(i);
intResult := intResult + i;
inc(i);
intResult := intResult * i;
inc(i);
intResult := intResult div i;
inc(i);
end;
stopTime := timeGetTime;
elapsedTime := stopTime – startTime;
WriteLn(‘Int arithmetic elapsed time: ‘ + inttostr(elapsedTime) + ‘ms with max of ‘ + inttostr(intMax));
WriteLn(‘ i: ‘ + inttostr(i) + ‘ intResult: ‘ + inttostr(intResult));
result := elapsedTime;
end;
function doubleArithmetic(doubleMin, doubleMax:Double):Longint;
var
doubleResult ouble;
i :double;
begin
startTime := timeGetTime;
doubleResult := doubleMin;
i := doubleMin;
while (i < doubleMax) do
begin
doubleResult := doubleResult – i;
i:=i+1;
doubleResult := doubleResult + i;
i:=i+1;
doubleResult := doubleResult * i;
i:=i+1;
doubleResult := doubleResult / i;
i:=i+1;
end;
stopTime := timeGetTime;
elapsedTime := stopTime – startTime;
WriteLn(‘Double arithmetic elapsed time: ‘ + inttostr(elapsedTime) + ‘ ms with min of ‘ + floattostr(doubleMin) + ‘, max of ‘ + floattostr(doubleMax));
WriteLn(‘ i: ‘ + floattostr(i) + ‘ doubleResult: ‘ + floattostr(doubleResult));
result := elapsedTime;
end;
function longArithmetic(longMin, longMax:Int64):Longint;
var
longResult :Int64;
i :Int64;
begin
startTime := timeGetTime;
longResult := longMin;
i := longMin;
while (i < longMax) do
begin
longResult := longResult – i;
inc(i);
longResult := longResult + i;
inc(i);
longResult := longResult * i;
inc(i);
longResult := longResult div i;
inc(i);
end;
stopTime := timeGetTime;
elapsedTime := stopTime – startTime;
WriteLn(‘Long arithmetic elapsed time: ‘ + inttostr(elapsedTime) + ‘ ms with min of ‘ + inttostr(longMin) + ‘, max of ‘ + inttostr(longMax));
WriteLn(‘ i: ‘ + inttostr(i));
WriteLn(‘ longResult: ‘ + inttostr(longResult));
result := elapsedTime;
end;
function trig(trigMax:double):Longint;
var
sine :double;
cosine :double;
tangent :double;
logarithm :double;
squareRoot :double;
i :double;
begin
startTime := timeGetTime;
sine := 0.0;
cosine := 0.0;
tangent := 0.0;
logarithm := 0.0;
squareRoot := 0.0;
i := 0.1;
while (i < trigMax) do
begin
sine := Sin(i);
cosine := Cos(i);
tangent := Tan(i);
logarithm := Log10(i);
squareRoot := sqrt(i);
i := i+1;
end;
stopTime := timeGetTime;
elapsedTime := stopTime – startTime;
WriteLn(‘Trig elapsed time: ‘ + inttostr(elapsedTime) + ‘ ms with max of ‘ + floattostr(trigMax));
WriteLn(‘ i: ‘ + floattostr(i));
WriteLn(‘ sine: ‘ + floattostr(sine));
WriteLn(‘ cosine: ‘ + floattostr(cosine));
WriteLn(‘ tangent: ‘ + floattostr(tangent));
WriteLn(‘ logarithm: ‘ + floattostr(logarithm));
WriteLn(‘ squareRoot: ‘ + floattostr(squareRoot));
result := elapsedTime;
end;
function io(ioMax:integer):Longint;
var
textLine :string;
i:integer;
myLine:string;
F:TextFile;
begin
startTime := timeGetTime;;
textLine := ‘abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz1234567 890abcdefgh’;
i := 0;
myLine := ”;
assignfile(F,’TestDelphi.txt’);
rewrite(F);
while (i < ioMax) do
begin
Writeln(F,textLine);
inc(i);
end;
CloseFile(F);
stopTime := timeGetTime;
elapsedTime := stopTime – startTime;
WriteLn(‘IO elapsed time: ‘ + inttostr(elapsedTime) + ‘ ms with max of ‘ + inttostr(ioMax));
WriteLn(‘ i: ‘ + inttostr(i));
WriteLn(‘ myLine: ‘ + myLine);
result := elapsedTime;
end;
begin
intMax := 1000000000;
doubleMin := 10000000000;
doubleMax := 11000000000;
longMin := 10000000000;
longMax := 11000000000;
trigMax := 10000000;
ioMax := 1000000;
WriteLn(‘Start Delphi benchmark’);
intArithmeticTime := intArithmetic(intMax);
doubleArithmeticTime := doubleArithmetic(doubleMin, doubleMax);
longCountTime := longArithmetic(longMin, longMax);
trigTime := trig(trigMax);
ioTime := io(ioMax);
totalTime := intArithmeticTime + doubleArithmeticTime + longCountTime + trigTime + ioTime;
WriteLn(‘Total Delphi benchmark time: ‘ + floattostr(totalTime) + ‘ ms’);
WriteLn(‘End Delphi benchmark’);
Readln;
end.
in the trig function i modify
i := 0.0;
by
i := 0.1;
because log10 delphi function don’t accept 0…
am1800+ 512mb
result test:
nt aritmetic: 8121ms
double aritmetic: 11627ms
long aritmetic: 112101ms
trigo: 3896ms
io: 3835ms
total: 139580ms
Fascinating stuff. The most interesting thing I’ve noticed is that the AUTHOR has been one of the few (possibly the only one I didn’t read EVERY post) who mentioned the lack of analysis regarding data base access and its importance for “real world” applications.
I would LOVE to see the results of a few of these languages hitting some databases. I know that would open up the very ugly world of data base comparisons, but what the heck. It would be great to run tests against Access, MS-SQL, DB2 and Oracle to see how each language and it’s preferred DB driver does.
It would be INCREDIBLY interesting to compare ADO.NET with JDBC, ODBC or whatever but I think it would take more resources than a single P4 laptop huh?
The starting page claims you test 32-bit and 64-bit math. Python doesn’t use native 64-bit types on 32-bit architectures if I recall, and promotes all integers that don’t fit into a native type into long (arbitrary-precision) integers. Long integers don’t use hardware multiplication or specialized algorithms, while all of the other languages do use those, doing Python some injustice. I think this should’ve been mentioned at the beginning of the article, because what Python is doing isn’t really 32-bit math or 64-bit math in the sense that most programmers are used to.
You could also do much better in some tests by using Numeric Python/numarray, which is designed for these kinds of problems.
And about Perl vs. Python: both are compiled into intermediate code, in Perl it just has a different form.
Since the hardware you use is different it´s difficult to make comparations so i suggest to use gcc with the options in the article as baseline.
to increase delphi performance for long test…
change
longResult := longResult div i; (112101ms)
to
longResult := Trunc(longResult/i); (19958ms )
somebody can translate the code to: Caml, EIFFEL, assembler…
=======
http://pages.infinit.net/borland/
In the earlier messages, someone complained about a lack of unsigned data types in java and C#. The C# comment appears to be incorrect.
Look at:
System.UInt16
System.UInt32
System.UInt64
System.UIntPtr
These all claim to be unsigned integers for anyone who needs them
I’m interested in memory benchmarks. I know Java JITs the code, but what if this JITting causes > 60 MB of memory consumption (when it could be 5 MB, for example)?? The machine would swap a lot (supposing the machine is being heavily used; 60 MB may not be too much, but what if you have 5 x 60 MB apps running?)
There is a serious problem with the long math benchmarks, due to python being a dynamically (not statically) typed language.
In C you can say
long long int i;
and get a 64-bit signed integer (in C99).
If you do an operation that makes ‘i’ too big it overflows, ‘i’ then contains the incorrect answer, but its type remains the same.
Python works differently to C (and the others). You can use a (plain) integer type, add to it, and instead of overflowing it is dynamically promoted to a long integer.
$ python
>>> i=1
>>> type(i)<type ‘int’>
>>> i=i+pow(2,32)
>>> i
4294967297L
>>> type(i)<type ‘long’>
Furthermore, the “long integer” does not have 64-bit precision, it has unlimited precision!
For example try
$ python
>>> n = pow(2,63)-1
>>> n
9223372036854775807L
>>> n = n * 10
>>> n
92233720368547758070L
>>> pow(2,128)
340282366920938463463374607431768211456L
>>> pow(2,256)
1157920892373161954235709850086879078532699846656405640394575840079131 29639936L
This is why in the long math (64-bit integer) benchmark the LongResult for C and Python differ.
C has 776627965
Python has 10000000000
In a integer benchmark the results must match otherwise you are not benchmarking the same operations!
For the third iteration through the loop, i = 10000000002
longResult = 10000000001 * 10000000002= 100000000030000000002
This is more than a 64-bit signed int can handle and it overflows. Python however calculates the correct results for integers bigger than 64 bits.
It looks to me like every multiply operation in the long integer test in C overflows 64 bits, so you are benchmarking 1/4 billion 64-bit integer overflows in C (and the others) against 1/4 billion 128-bit integer multiplications in Python, not a fair comparison.
Python is slower but it gives you the correct result!
A fair benchmark would involve the recoding C (and others) so that they check for 64-bit integer overflow and then do 128-bit arithmetic. Not so easy to do, Python however gives you this for free as its built in.
VB6 would have been faster.
AthlonXP 1800+
512MB RAM
Gentoo 1.4 compiled with -mcpu=athlon-xp
Linux 2.4.23 + Con Kolivas patchset
# gcc -v
gcc version 3.3.2 20031201 (Gentoo Linux 3.3.2-r4, propolice)
# gcc -march=athlon-xp -mmmx -O3 Benchmark.c -s -o bench_c -lm
Start C benchmark
Int arithmetic elapsed time: 7950 ms with intMax of 1000000000
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 7480 ms with doubleMin 10000000000.000000, doubleMax 11000000000.000000
i: 11000000000.000000
doubleResult: 10011632717.388229
Long arithmetic elapsed time: 18750 ms with longMin 1410065408, longMax 2
i: -1884901888
longResult: 776627965
Trig elapsed time: 4270 ms with max of 10000000
i: 10000000.000000
sine: 0.990665
cosine: -0.136322
tangent: -7.267119
logarithm: 7.000000
squareRoot: 3162.277502
I/O elapsed time: 1220 ms with max of 1000000
last line: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total elapsed time: 39670 ms
Stop C benchmark
This guy really need to learn to write better benchmarks, because in this one he is comparing oranges to apples (e.g. in Python the I/O benchmark preallocates in memory a list of all the lines to be written to the file) and almost none of the benchmarks test *real* *life* performance (that very often involves complicated data structure and deep call stacks: things like “intResult -= i++;” test only your CPU speed, nothing more).
What about, memory footprint comparation …..
Kernel 2.6.0-1mdk
glibc-2.3.3-1mdk
CPU: Athlon 1.2Ghz
Mem: 256MB
/usr/java/j2sdk1.4.2_01/jre/bin/java -version
java version “1.4.2_01”
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_01-b06)
Java HotSpot(TM) Client VM (build 1.4.2_01-b06, mixed mode)
/usr/java/j2sdk1.4.2_01/bin/javac -O -target 1.4 -g:none Benchmark.java
——
gcj –version
gcj (GCC) 3.3.2 (Mandrake Linux 10.0 3.3.2-3mdk)
Copyright (C) 2003 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
gcj -o Benchmark -O3 -march=athlon -mcpu=athlon -fomit-frame-pointer -fforce-addr -fforce-mem -ffast-math -pipe -falign-loops -falign-functions -falign-jumps –main=Benchmark Benchmark.java
/usr/java/j2sdk1.4.2_01/jre/bin/java -server Benchmark
Start Java benchmark
Int arithmetic elapsed time: 10621 ms with max of 1000000000 [12:02:26]
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 17518 ms with min of 1.0E10, max of 1.1E10
i: 1.1E10 [12:27:35]
doubleResult: 1.00116327174955E10
Long arithmetic elapsed time: 34736 ms with min of 10000000000, max of 11000000000
i: 11000000000
longResult: 776627965
Trig elapsed time: 118742 ms with max of 1.0E7
i: 1.0E7
sine: 0.9906646477361263
cosine: -0.13632151600483616
tangent: -7.267118770165242
logarithm: 16.118095550958316
squareRoot: 3162.2775020544923
IO elapsed time: 6665 ms with max of 1000000
i: 1000001
myLine: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total Java benchmark time: 188282 ms
End Java benchmark
./Benchmark (gcj)
Start Java benchmark
Int arithmetic elapsed time: 10632 ms with max of 1000000000 [12:28:01]
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 6022 ms with min of 1.0E10, max of 1.1E10
i: 1.1E10
doubleResult: 1.0011632717389214E10
Long arithmetic elapsed time: 25322 ms with min of 10000000000, max of 11000000000
i: 11000000000
longResult: 776627965
Trig elapsed time: 18140 ms with max of 1.0E7
i: 1.0E7
sine: 0.9906646477361245
cosine: -0.1363215160048489
tangent: -7.267118770165242
logarithm: 16.118095550958316
squareRoot: 3162.2775020544923
IO elapsed time: 13913 ms with max of 1000000
i: 1000001
myLine: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total Java benchmark time: 74029 ms
End Java benchmark
If your real-world application resembles this benchmark, then the results >might< be useful.
For any floating point codes I have seen (Orbit/Attitide determination, weather simulation, and so forth), accurate results are more important than sheer speed. See the evaluation on this bug report for a discussion of wildly inaccurate results from simple-minded calculations:
http://developer.java.sun.com/developer/bugParade/bugs/4807358.html
As mentioned in the Evaluation: if you don’t care about accurate, consistent results, why do you need the calculation at all?
By altering the values given to sin, cos, and tan from 0 – trigMax and normalizing then to 0 – 2PI, Java 1.4.2_03 benchmark improves from 57 secs to 10 secs. This change provides a more even distribution of radian values than does the original 0 – 10M. It would seem that the cost of certain radian values for the trig functions is not evenly distributed in java.
I wonder if it is the same for all languages?
Curt
I was able to test both the 6.0 and .net2003 compilers on a PIII 733 Win2000 box.
Best times of three runs. Runs varied no more than 2%.
6.0
Int 16183
Double 22742
Long 41220
Trig 5928
I/O 5148
Total 91221
.net (new unmanaged project)
Int 16183
Double 23423
Long 41360
Trig 5888
I/O 5258
Total 92112
>> the latest Hotspot does an excellent job getting rid of
>> array bounds checking in many instances
How do you know that Hotspot does an excellent job of eliminating ABC’s? Sun does not tell us the means by which it determines whether ABC’s can be eliminated, nor do they provide a mechanism telling us where ABC’s have successfully been eliminated (which would allow us to do some reverse engineering and figure out in what scenarios ABC’s can be eliminated)… this leads to a question:
Can code be written in a way which would make the compiler better able to eliminate ABC’s? I’m not quite sure what this would entail, but I’m curious as to whether or not it can be done.
I don’t know if this is just me, but i’ve tried several times compiling c code in cygwin e the same c code outside cygwin(linux for exemple) and it seems that gcc in cygwin suffers some lag.
So perhaps, the benchmark of C was not that right
A lang. benchmark is not going to be very useful in this context. As numerous people have pointed out, you need a benchmark that will simulate more of what you need your code todo in a given project, to see if there’s any performance benefit from switching to another lang. This will generally, be different for each project. Not to mention, this is more of a compiler benchmark, then a lang. benchmark, and GCC’s claim to fame, is portabilty, not optimization, especially for non-x86 arch’s (Sun’s Compilers, and SGI’s MIPSpro compilers will give you a binary that usually performs about %300-%500 Better than a gcc built binary, since I don’t use x86 hardware much, I am not an expert in what compiler you should use to benchmark on that platform, but I would imagine Intel’s compiler to be leaps and bounds ahead of gcc in optimization.
Thus the blanket assurtion that C shouldn’t be used for speed anymore is wrong, and then adding that the code is less maintainable, is absurd. People have been maintaining C source far longer than most other languages in such wide use (Fortran/etc. the execption, and they still have their place as well). The Best lang is often times, the one the programmer is most familiar with, but C can generally be optimized much better than other languages (C++ included.), although some languages, like fortran, are easier for the compiler to parallize for MP’s (take into account SGI’s -apo options) for multiple reasons beyond the scope of this comment.
The author needs to give the machine specs, compilers and versions used, source code, and the flags passed to the compiler… for example on irix I might compile like
cc -Ofast=ip30 -TARG:platform=ip30:isa=mips4:processor=r14000 etc. etc. etc.
Actually perl is compiled. It’s just that it normally isn’t stored like a compiled binary. Interpreting languages are usually iterpreted on a line by line bases, while all of the perl code is read and converted to something machine executable before a perl program starts doing some useful work.
At least in some unix environments it is possible to dump the compiled perl binary to create a true executable.
Hi,
I was very surprised to find VB.NET’s bad file i/o performance especially compared to C# because they should indeed produce close results.
And I just checked the VB benchmark Code and the reason was evident. The author used FileOpen, LineInput functions which were available as keywords (open , line input, close etc.) in VB6. In .NET word, MS has provided them as functions to make migration simpler however at the cost of performance.
Use of functions from System.IO namespace (as C# code must be using) should give the same performance for VB.NET.
-Vinay.
Author(s) of this benchmark must be drunk during the new-year-days… Using GCC from Cygwin as a reference is the most ridiculous thing some “programmer” can do. No comment.
the result only show us Windowz is an ill platform, it is clear that C is the fastest on Linux
as has been mentioned by others, the java performance is probably due to strict floating point … here are some links on that:
http://java.sun.com/docs/books/jls/second_edition/html/expressions….
http://java.sun.com/docs/books/vmspec/2nd-edition/html/Concepts.doc…
http://www.jcp.org/en/jsr/detail?id=84
See – http://forum.java.sun.com/thread.jsp?forum=31&thread=481284&message…