This article discusses a small-scale benchmark test run on nine modern computer languages or variants: Java 1.3.1, Java 1.4.2, C compiled with gcc 3.3.1, Python 2.3.2, Python compiled with Psyco 1.1.1, and the four languages supported by Microsoft’s Visual Studio .NET 2003 development environment: Visual Basic, Visual C#, Visual C++, and Visual J#. The benchmark tests arithmetic and trigonometric functions using a variety of data types, and also tests simple file I/O. All tests took place on a Pentium 4-based computer running Windows XP. Update: Delphi version of the benchmark here.
Why benchmark?
Five questions motivated me to design and run these benchmarks. First, I was curious about how the performance of Java 1.4.2 (the latest official version from Sun) compares to that of Microsoft’s relatively new .NET 2003 suite of languages. Both Java and the .NET languages are “semi-compiled” (or, looking at the flip side of the coin, “semi-interpreted”). By this I mean that source code is compiled into intermediate-level code and then run by a combination interpreter/just-in-time compiler. With Java, the intermediate language is called bytecode and the interpreter/compiler is called a Java Virtual Machine (JVM). Source code in the .NET world is compiled into the Microsoft Intermediate Language (MSIL) and is run on the .NET Common Language Runtime (CLR) engine.
The .NET languages benefit from many of the same features that have made Java so popular, including automatic resource management/garbage collection and type safety. They also add interesting new features and conveniences such as cross-language debugging, easy GUI design, and virtually idiot-proof application deployment. But what is the performance penalty of these new features? By adding layers of complexity to its programming model, has Microsoft given up its speed advantage over Java?
Microsoft makes it especially easy to compare the overhead of the Java and .NET frameworks by including J# in the .NET suite. This language is syntactically identical to Java (although it implements only version 1.1.4 of the Java spec, which is by now quite out of date), so any differences in speed between Java and J# should be attributable purely to differences between the Sun and Microsoft runtime overhead.
Second, I wanted to assess Microsoft’s claim that the same routine coded in any of the .NET languages is compiled into identical MSIL code which will ultimately run at the same speed. This led me to keep the benchmark very simple, so that I could make sure the routines in each of the .NET languages really were functionally identical. Would all four languages really run at the same speed?
Third, I was curious to see how much slower Java or the .NET languages are than a fully compiled language like C, especially when the C program is unburdened by the runtime overhead of the CLR. I first tried to eliminate the CLR from the Visual C++ benchmark by turning off the language’s “managed” features with the #pragma unmanaged
directive, but I was surprised to see that this didn’t lead to any performance gains. After that strategy failed, I recompiled the Visual C++ program with Gnu’s gcc C compiler in order to give C every opportunity to shine in its native, unmanaged, CRL-free form.
Fourth, I wanted to find out how semi-compiled languages compare to fully interpreted languages like Python, Perl or PHP. It is often said that as hardware continues to get faster and cheaper we will reach a point where the extra speed of compiled languages will be largely unnecessary. But if there is still an order-of-magnitude difference between the performance of a routine coded in C and the same algorithm coded in Python, we would be wise to keep our C skills up to date. To test this, I wrote another version of the benchmark in Python. I then re-ran the Python benchmark with the Psyco just-in-time compiler to see if we could combine Python’s spectacular readability and rapid development with the speed of a compiled language. Greedy perhaps, but worth a try.
Finally, I thought it would be interesting to see how Sun’s latest Java release compares to earlier versions. Sun has makes strong claims about performance improvements in the 1.4.2 version of its compiler and JVM relative to the earlier 1.3.1 release, and I wanted to see if the performance lived up to the hype. So I added Java 1.3.1 to the benchmark roster.
Designing good, helpful benchmarks is fiendishly difficult. This fact led me to keep the scope of this benchmark quite limited. I tested only math operations (32-bit integer arithmetic, 64-bit integer arithmetic, 64-bit floating point arithmetic, and 64-bit trigonometry), and file I/O with sequential access. The tests were not comprehensive by any stretch of the imagination; I didn’t test string manipulation, graphics, object creation and management (for object oriented languages), complex data structures, network access, database access, or any of the countless other things that go on in any non-trivial program. But I did test some basic building blocks that form the foundation of many programs, and these tests should give a rough idea of how efficiently various languages can perform some of their most fundamental operations.
Here’s what happens in each part of the benchmark:
32-bit integer math: using a 32-bit integer loop counter and 32-bit integer operands, alternate among the four arithmetic functions while working through a loop from one to one billion. That is, calculate the following (while discarding any remainders):
1 – 1 + 2 * 3 / 4 – 5 + 6 * 7 / 8 – … – 999,999,997 + 999,999,998 * 999,999,999 / 1,000,000,000
64-bit integer math: same algorithm as above, but use a 64-bit integer loop counter and operands. Start at ten billion and end at eleven billion so the compiler doesn’t knock the data types down to 32-bit.
64-bit floating point math: same as for 64-bit integer math, but use a 64-bit floating point loop counter and operands. Don’t discard remainders.
64-bit floating point trigonometry: using a 64-bit floating point loop counter, calculate sine, cosine, tangent, logarithm (base 10) and square root of all values from one to ten million. I chose 64-bit values for all languages because some languages required them, but if a compiler was able to convert the values to 32 bits, I let it go ahead and perform that optimization.
I/O: Write one million 80-character lines to a text file, then read the lines back into memory.
At the end of each benchmark component I printed a value that was generated by the code. This was to ensure that compilers didn’t completely optimize away portions of the benchmarks after seeing that the code was not actually used for anything (a phenomenon I discovered when early versions of the benchmark returned bafflingly optimistic results in Java 1.4.2 and Visual C++). But I wanted to let the compilers optimize as much as possible while still ensuring that every line of code ran. The optimization settings I settled on were as follows:
Java 1.3.1: compiled with javac -g:none -O
to exclude debugging information and turn on optimization, ran with java -hotspot
to activate the just-in-time compiler within the JVM.
Java 1.4.2: compiled with javac -g:none
to exclude debugging information, ran with java -server
to use the slower-starting but faster-running server configuration of the JVM.
C: compiled with gcc -march=pentium4 -msse2 -mfpmath=sse -O3 -s -mno-cygwin
to optimize for my CPU, enable SSE2 extensions for as many math operations as possible, and link to Windows libraries instead of Cygwin libraries.
Python with and without Psyco: no optimization used. The python -O
interpreter flag optimizes Python for fast loading rather than fast performance, so was not used.
Visual Basic: used “release” configuration, turned on “optimized,” turned off “integer overflow checks” within Visual Studio.
Visual C#: used “release” configuration, turned on “optimize code” within Visual Studio.
Visual C++: used “release” configuration, turned on “whole program optimization,” set “optimization” to “maximize speed,” turned on “global optimizations,” turned on “enable intrinsic functions,” set “favor size or speed” to “favor fast code,” set “omit frame pointers” to “yes,” set “optimize for processor” to “Pentium 4 and above,” set “buffer security check” to “no,” set “enable enhanced instruction set” to “SIMD2,” and set “optimize for Windows98” to “no” within Visual Studio.
Visual J#: used “release” configuration, turned on “optimize code,” turned off “generate debugging information” within Visual Studio.
All benchmark code can be found at my website. The Java benchmarks were created with the Eclipse IDE, but were compiled and run from the command line. I used identical source code for the Java 1.3.1, Java 1.4.2, and Visual J# benchmarks. The Visual C++ and gcc C benchmarks used nearly identical source code. The C program was written with TextPad, compiled using gcc within the Cygwin bash shell emulation layer for Windows, and run from the Windows command line after quitting Cygwin. I programmed the Python benchmark with TextPad and ran it from the command line. Adding Psyco’s just-in-time compilation to Python was simple: I downloaded Psyco from Sourceforge and added import psyco
and psyco.full()
to the top of the Python source code. The four Microsoft benchmarks were programmed and compiled within Microsoft Visual Studio .NET 2003, though I ran each program’s .exe
file from the command line.
It should be noted that the Java log()
function computes natural logarithms (using e as a base), whereas the other languages compute logarithms using base 10. I only discovered this after running the benchmarks, and I assume it had little or no effect on the results, but it does seem strange that Java has no built-in base 10 log function.
Before running each set of benchmarks I defragged the hard disk, rebooted, and shut down unnecessary background services. I ran each benchmark at least three times and used the best score from each component, assuming that slower scores were the result of unrelated background processes getting in the way of the CPU and/or hard disk. Start-up time for each benchmark was not included in the performance results. The benchmarks were run on the following hardware:
Type: Dell Latitude C640 Notebook
CPU: Pentium 4-M 2GHz
RAM: 768MB
Hard Disk: IBM Travelstar 20GB/4500RPM
Video: Radeon Mobility 7500/32MB
OS: Windows XP Pro SP 1
File System: NTFS
Results
Here are the benchmark results presented in both table and graph form. The Python and Python/Psyco results are excluded from the graph since the large numbers throw off the graph’s scale and render the other results illegible. All scores are given in seconds; lower is better.
int math |
long math |
double math |
trig |
I/O |
TOTAL |
|
Visual C++ | 9.6 | 18.8 | 6.4 | 3.5 | 10.5 | 48.8 |
Visual C# | 9.7 | 23.9 | 17.7 | 4.1 | 9.9 | 65.3 |
gcc C | 9.8 | 28.8 | 9.5 | 14.9 | 10.0 | 73.0 |
Visual Basic | 9.8 | 23.7 | 17.7 | 4.1 | 30.7 | 85.9 |
Visual J# | 9.6 | 23.9 | 17.5 | 4.2 | 35.1 | 90.4 |
Java 1.3.1 | 14.5 | 29.6 | 19.0 | 22.1 | 12.3 | 97.6 |
Java 1.4.2 | 9.3 | 20.2 | 6.5 | 57.1 | 10.1 | 103.1 |
Python/Psyco | 29.7 | 615.4 | 100.4 | 13.1 | 10.5 | 769.1 |
Python | 322.4 | 891.9 | 405.7 | 47.1 | 11.9 | 1679.0 |
Click the thumbnail or here for a full-sized graph of the results
Analysis
Let’s review the results by returning to the five questions that motivated these benchmarks. First, Java (at least, in the 1.4.2 version) performed very well on most benchmark components when compared to the .NET 2003 languages. If we exclude the trigonometry component, Java performed virtually identically to Visual C++, the fastest of Microsoft’s languages. Unfortunately, the trigonometry performance of Java 1.4.2 can only be described as dismal. It was bafflingly bad–worse even than fully interpreted Python! This was especially puzzling given the much faster trigonometry performance of Java 1.3.1, and suggests that there may be more efficient ways to code the benchmark in Java. Perhaps someone with more experience with 1.4.2 can suggest a higher-speed workaround.
Java performed especially well (when discounting the strange trigonometry performance) compared to Microsoft’s syntactically equivalent Visual J#. This discrepancy may be due to the additional overhead of the CLR engine (as compared to the overhead of the JVM), or may have something to do with Visual J# implementing only version 1.1.4 of the Java spec.
Second, Microsoft’s claim that all four .NET 2003 languages compile into identical MSIL code seemed mostly true for the math routines. The integer math component produced virtually identical scores in all four languages. The long math, double math, and trig scores were identical in Visual C#, Visual Basic, and Visual J#, but the C++ compiler somehow produced impressively faster code for these benchmark components. Perhaps C++ is able to make better use of the Pentium 4’s SSE2 SIMD extensions for arithmetic and trigonometry, but this is pure speculation on my part. The I/O scores fell into two clusters, with Visual Basic and Visual J# apparently using much less efficient I/O routines than Visual C# or Visual C++. This is a clear case where functionally identical source code does not compile into identical MSIL code.
Third, Java 1.4.2 performed as well as or better than the fully compiled gcc C benchmark, after discounting the odd trigonometry performance. I found this to be the most surprising result of these tests, since it only seems logical that running bytecode within a JVM would introduce some sort of performance penalty relative to native machine code. But for reasons unclear to me, this seems not to be true for these tests.
Fourth, fully interpreted Python was, as expected, much slower than any of the fully compiled or semi-compiled languages–sometimes by a factor of over 60. It should be noted that Python’s I/O performance was in the same league as the fastest languages in this group, and was faster than Visual Basic and Visual J#. The Psyco compiler worked wonders with Python, reducing the time required for the math and trig components to between 10% and 70% of that required for Python without Psyco. This was an astonishing increase, especially considering how easy it is to include Psyco in a Python project.
Fifth, Java 1.4.2 was much faster than Java 1.3.1 in the arithmetic components, but as already mentioned, it lagged way behind the older version on the trigonometry component. Again, I can’t help but think that there may be a different, more efficient way to call trigonometric functions in 1.4.2. Another possibility is that 1.4.2 may be trading accuracy for speed relative to 1.3.1, with new routines that are slower but more correct.
What lessons can we take away from all of this? I was surprised to see the four .NET 2003 languages clustered so closely on many of the benchmark components, and I was astonished to see how well Java 1.4.2 did (discounting the trigonometry score). It would be foolish to offer blanket recommendations about which languages to use in which situations, but it seems clear that performance is no longer a compelling reason to choose C over Java (or perhaps even over Visual J#, Visual C#, or Visual Basic)–especially given the extreme advantages in readability, maintainability, and speed of development that those languages have over C. Even if C did still enjoy its traditional performance advantage, there are very few cases (I’m hard pressed to come up with a single example from my work) where performance should be the sole criterion when picking a programming language. I would even argue that that for very complex systems that are designed to be in use for many years, maintainability ought to trump all other considerations (but that’s an issue to take up in another article).
Expanding the Benchmark
The most obvious way to make this benchmark more useful is to expand it beyond basic arithmetic, trigonometry, and file I/O. I could also extend the range of languages or variants tested. For example, testing Visual Basic 6 (the last of the pre-.NET versions of VB) would give us an idea how much (if any) of a performance hit the CLR adds to VB. There are other JVMs available to be tested, including the open-source Kaffe and the JVM included with IBM’s SDK (which seems to be stuck at version 1.3 of the Java spec). BEA has an interesting JVM called JRockit which promises performance improvements in certain situations, but unfortunately only works on Windows. GNU’s gcj front-end to gcc allows Java source code to be compiled all the way to executable machine code, but I don’t know how compatible or complete the package is. There are a number of other C compilers available that could be tested (including the highly regarded Intel C compiler), as well as a host of other popular interpreted languages like Perl, PHP, or Ruby. So there’s plenty of room for further investigation.
I am by no means an expert in benchmarking; I launched this project largely as a learning experience and welcome suggestions on how to improve these benchmarks. Just remember the limited ambitions of my tests: I am not trying to test all aspects of a system–just a small subset of the fundamental operations on which all programs are built.
About the author:
Christopher W. Cowell-Shah works in Palo Alto as a consultant for the Accenture Technology Labs (the research & development wing of Accenture). He has an A.B. in computer science from Harvard and a Ph.D. in philosophy from Berkeley. Chris is especially interested in issues in artificial intelligence, human/computer interaction and security. His website is www.cowell-shah.com.
What about a mono or portable.net test with the same benchmark ? I would be *very* curious of knowing which level of performance they can achieve.
I’m curious as to how much better gcc and python would perform in a POSIX environment, especially gcc linked to glibc rather than to the windows C libraries.
Also, what happened to perl?
.. Any chance of trying the intel and the openwatcom compilers?
.. It would also be interesting to try the benchmarks on another operating system to see if the same level of differences are observed.
All .net languages will perform exactly the same because they are all compiled down to the CLR comman language runtime.
So your VB.net app will perform the same as C# and so will your Delphi.net, cobol.net etc etc etc.
You only really need to benchmark C#
Perl is not a compile language… perhaps thats why it was left out.
Mike
>Perl is not a compile language… perhaps thats why it >was left out.
>Mike
Python is interpreted and it wasn’t left out.
All .Net languages are not the same! while they might in simple cases produce the same MSIL code. In more complex situations they will not, leading ofcourse to diffenet results.
>Python is interpreted and it wasn’t left out.
Actually… if you read the posting, he compiled the python code with Psyco. Also python is compiled into byte code at runtime for fast execution too. Perl does not behave like this nor can you compile it. So my commet remains.
Mike
>In more complex situations they will not, leading ofcourse to diffenet results.
Well said Yoni.
>> Article: I first tried to eliminate the CLR from the Visual C++ benchmark by turning off the language’s “managed” features with the #pragma unmanaged directive, but I was surprised to see that this didn't lead to any performance gains.
Just start a new, unmanaged project and add your Standard C++ code to it.
Actually, if you read the posting, he benchmarked Python with Psyco, and also without, getting both. And, if you read the posting, he also mentioned it would be interesting to see what Perl, PHP, and Ruby results would look like.
Comparing server VM with client VM is invalid.
He had to put both in both cases.
I tested 1.4.2 vs. 1.3.1 (both SUN VMs) and 1.4.2 is 3 times slower than 1.3.1 (they rewrote the VM itself and System.arraycopy() that is being used everywhere got 3x slower).
I have failed a bug report on that. Reply was: “Too late”, which disappointed me as this is a major regression and had to be caught by their QA people, not me.
Well, let’s hope that 1.5 they improve performance to 1.3.1 level.
The author states that he is surprised that java performs better than compiled code…. This really shouldn’t be a surprise. The Java virtual machine compiles its code just like a c++ compiler. There is just one big difference. The c++ compiler compiles its code before it is run, and Java while it is being run. In other words, Java actually knows more about how the code is used, which in theory should let it reach better performance than c++. In real life though, its just recently (the last couple of years) that Java has actually approached (and in some cases passed) c++.
I suggest he should benchmark Common Lisp (perhaps using Corman Common Lisp, Allegro or LispWorks on Windows). Common Lisp is a native-compiled language which provides even more dynamism than the popular interpreted languages like Perl, Python, Ruby and PHP. It’s not hard to learn for someone used to these languages and may provide a nice surprise for the benchmark results.
The default math library is compiled with -O0 to preserve strict IEEE semantics. In fact, with minor change to the source code, -O2 will work as well. Java has two math libs, Math and StrictMath. They are default to the same implementation. But JVM is allowed to use faster/less accurate version of Math. The VC++ uses loose math (x86 trig instructions directly).
Perl is compiled in a similar manner to python – it just isn’t written to disk. Perl 6 more so – read about Parrot if you’re interested. Why perl isn’t included is really a question to the author and if I were to guess it would perform similarly to C++ since it generally wraps the standard libraries unless i/o is included in the bench – in which case there’s a penalty for parsing the file…Then it should be comparable to python. How ’bout it Christopher – run a perl bench?
Will
Hello,
Given the large differences between VB.net and C#, it is very likely you are doing something wrong. You may be mistakenly using a different construct. The `native’ VB IO functions may be much slower than the standard CLR classes (System.IO). If you are using the Visual Basic library, you are not fairly testing the language. Again, you must post your source code to allow independant review.
As well, I would love to run the C# tests on Mono.
Another thing I should point out, most applications *DO NOT* involve intensive IO or Math alone. This is not a measure of true application performance. You are mearly measuring how well the JIT or compiler emits code for a specific case. I am sure any of the JIT developer could optimize for this specific test case. I think prehaps the more interesting view you could take is `what language provides highspeed building blocks — such as collections classes, callback functionality, and object creation.’ The answer to this question is *MUCH* less of a micro-benchmark.
Also, I would add, for JIT’d languages, you should call the function you are requesting once before you do the call that you time. Depending on how you structure your run, you may end up counting JIT time. Although JIT time can matter in a 60 second benchmark, when running a web server for days, weeks or even months at a time it really does not matter. In fact, many applications use a native code precompiler to reduce startup time (under Mono, Miguel de Icaza often reports performance improvements of over 30% by using AOT compilation of our C# compiler mcs.exe [times are for the compilation of our mscorlib library, consisting of 1000 C# source files]). However, AOT does loose out in a large bench mark like this because it is forced to generate sub-optimal code (like a C++ compiler). So, it is much more fair to allow for a warm up runt to allow the runtime to JIT the code.
— Ben
It’s funny that there are so many hardware sites that benchmark every aspect of CPU’s, chipsets, and graphics cards, but few people bother to benchmark software, programming languages, and operating systems.
I would also like to see mono and portable .NET benchmarks. Also gcc in a POSIX environment.
By the way, I liked the article – much higher quality than what you normally see here on osnews.
Firstly, Java code should, in the general best cases perform in the same manner as a well compiled C++ program. If we are doing pure loops and integer/FP tasks, there should be virtually nothing in it. A C++ compiler doing this properly should produce the same output as Java as a base case. A good C++ compiler using architecture optimisations should be able to do even better, though. The Java has the overhead of the VM and the JIT process, the additional predictiveness of which should be negated by a repetitive looping test anyway. Similarly, a well compiled benchmark from C and C++ should always be faster than a managed .Net application. The distance between the two will vary, but it should still be faster.
These benchmarks are rather daft, anyway, since they manage to avoid using any sort of objects. Java is meaningless for most real tasks without creating and manipulating objects (otherwise you’re basically writing C anyway), and objects are where Java really does slow down.
Last of all, I’d like to draw the author’s attention to the .Net framework EULAs… It is in fact a violation of the EULA to produce benchmarks of this sort of .Net against other platforms. Which is why they haven’t been done all over the place by now
Very interesting results. Sadly the sorting criteria (using the total instead of a geometric average) is unusual, and favors the languages that optimize the slow operations. The results of double math and trig show some big variations between languages (3:1 for double, more than 15:1 for trig) but this is not properly reflected in the results (in my humble opinion).
Here are the numbers with the geometric average. Notice how Java 1.3.1 suddenly appears much slower than Visual J# or Java 1.4.2, and Python/Psyco is far ahead of Python (the arithmetic average doesn’t show the improvement on the trig test).
Visual C++: 8.4
Visual C#: 11.1
gcc C: 13.2
Visual Basic: 13.9
Visual J#: 14.2
Java 1.4.2: 14.8
Java 1.3.1: 18.6
Python/Psyco: 47.9
Python: 145.5
Ignoring Python for the moment, it’s interesting to see that Java 1.3.1 is the only one that is far off the lead on most tests. gcc only need to improve on trig and long math, Visual C#/Basic/J# all have issues with long math and double math, with Visual Basic and J# suffering from slow I/O.
Java 1.4.2 has a very obvious and sever issue with trig. If that test was as fast as 1.3.1, Java 1.4.2 would score 12.2, very close to the lead. If it could be made to score 4.2 like Visual J#, that score would fall to 8.7, barely slower than Visual C++.
The differences between the various MSIL/CLR languages is also very interesting. It’s obvious that VC++ manages to issue better 64-bit code than the rest of the pack, and that I/O is the only differentiator between Visual C#, Basic and J#.
If we’re freely violating EULAs and all the rest of it, can anyone test the C++ code on Linux and the Java code on IBM’s VM? Both should be quite different.
I’ve been strongly considering picking Ruby up as my next language to learn, but it’s hard to find a lot of information (recent info, at least) on it that’s in English.
I was hoping to see it benchmarked as well… That could have been the push I need to get learnin’ it.
Does anyone here know both Java, Python & Ruby? Any thoughts as to speed, or reccomendations one way or the other?
He did, on the second page of the article:
http://www.ocf.berkeley.edu/~cowell/research/benchmark/code/
I would have ran VC++ 6 instead of VC++ .NET (or whatever it is called). Since the author didn’t know how to create an unmanaged project in VC++, I don’t believe his results when it comes to VC++.
I addition I run STLPort, instead of MS’s STL, which is much faster.
My results for mingw (instead of cygwin) on my athlon xp 2.4
(i just felt like i should test it
Start C benchmark
Int arithmetic elapsed time: 6125 ms with intMax of 1000000000
i: 1000000001
intResult: 1
Double arithmetic elapsed time: 5687 ms with doubleMin 10000000000.000000, doubleMax 11000000000.000000
i: 11000000000.000000
doubleResult: 10011632717.388229
Long arithmetic elapsed time: 20016 ms with longMin 10000000000, longMax 11000000000
i: 11000000000
longResult: 776627965
Trig elapsed time: 6750 ms with max of 10000000
i: 10000000.000000
sine: 0.990665
cosine: -0.136322
tangent: -7.267119
logarithm: 7.000000
squareRoot: 3162.277502
I/O elapsed time: 5484 ms with max of 1000000
last line: abcdefghijklmnopqrstuvwxyz1234567890abcdefghijklmnopqrstuvwxyz12345678 90abcdefgh
Total elapsed time: 44062 ms
Stop C benchmark
Great article btw.
Doing so simple maths and IO tests is completly useless.
Real differences lies within string/character manipulations, memory allocations, searching, sorting, garbage collecting, virtual calls througs classes, etc..
If you have some more time to spend on this benchmark, try thoses. Mileages will be much more differents.
Please don’t post the best of only 3 runs, it’s silly to do so because you are not getting a good sample of data.
You should to have run more like 10 runs, especially because the micro-benchmarks that you have produced can be run automatically in the background many times withyou user intervention.
Then you should provide the mean and median of the runs, along with all the data from each run.
On the topic of the Java benchmark:
The Java tests you should use both the server vm and the client VM and compare results. The 2 vms are actually very different. The java benchmark isn’t really doing much but testing the interpreter as I doubt that much of that code is actually being compiled to native code. I think that it’s fairly well know that for short running programs Java is slow ( but for long running programs it’s fairly competive). In the jave benchmark you shouldn’t be using new Date().getTime() to get the time, you should instead use System.currentTimeMillis() as it is faster and doesn’t involve the creation of more objects.
I think, it would be best to wait until after 2.0… Because, Ruby is kind of little slow, not thread-safe and many others. It forced me to learn the different language, but I will come back to Ruby when the 2.0 is released. I love Ruby, it’s easy and clean to me.
Last of all, I’d like to draw the author’s attention to the .Net framework EULAs… It is in fact a violation of the EULA to produce benchmarks of this sort of .Net against other platforms. Which is why they haven’t been done all over the place by now
If that is true, it’s rather astonishing! “BTW, if you use our product, you are forbidden to discuss its performance publicly.” It sounds like they have no faith in their product
Small synthetic benchmarks are generally not representative of real programs. Typically a benchmark suite of real applications that compute real things people are interested in are the best indicator, but unfortunately it is hard to find a large enough suite implemented well in a large enough number of languages to matter.
Even so, I will say this.
Java is the real star of this benchmarking effort. The conventional thinking of people who say “Java? Bytecode? VM? It will always be slow!” is clearly in error. A huge (and I do mean huge) amount of engineering effort by thousands of smart people from all kinds of institutions has gone into designing and building high-performance virtual machines these days, and Java, through mainly SUN and IBM’s efforts, has been the principal recipient of those benefits. JIT compilers are extremely advanced, far ahead in many areas than static compilers. It is no wonder that you see the performance gap rapidly closing–though it shouldn’t be called a gap because the potential to also exceed static compilation is huge.
The speed of the language has less and less to do with the speed of the resulting application these days. What matters most now (and it has always mattered) is smart designs and efficient algorithms. For integer and float math, the design space is small, but for an application the size of a webserver, a graphics program, a web browser, etc, the design space is huge. Even if it did break down to one language is X% slower than another (which kind of thinking is complete rubbish anyway), what does it matter?
Virtual machines get better every generation. And every single program ever written for that VM–anytime, anywhere, no matter who wrote it, how it was compiled, what platform it was on–gets faster right along with it. Static compilation is static–it has long slowed its evolution and stabilized. But dynamic compilation is evolving at an amazing rate.
Don’t be a naysayer, be excited about what the future brings for new languages!
Hello everybody,
In case anyone is interested, there is a very interesting benchmarking site (many languages, many tests) at:
http://www.bagley.org/~doug/shootout/
It doesn’t include the Microsoft new CLR language/language implementations, iirc, so the tests in the articles are still interesting.
It’s weird to see gcc performing so badly. maybe the cygwin overhead is to blame?
I think Python was misrepresented a bit here, since most Python programmers will either write the ‘number crunching’ parts of their programs as a c library or use more low level python modules such as numpy or scientific python.
Serious mathematical operations in pure Python are a rarity.
I would be interested to see how FORTRAN does in a similar benchmark.
Heck, FORTRAN 77, even. Unfortunately, I could only do it for Linux and Tru64. I don’t have a F95 compiler for my WinXP box at home.
If anyone out there is using ifort on WinXP, please try out the program.
Java is the real star of this benchmarking effort. The conventional thinking of people who say “Java? Bytecode? VM? It will always be slow!” is clearly in error.
There seems to be general agreement that Java is fast on the server-side, but most of the complaints about Java’s speed relate to it’s performance in desktop apps, something not tested in this benchmark.
Grrr…*shakes fist at Nate*
Actually, it’d be cool to see how gfortran does. I’m sure gcc would finish a hojillion times faster, but still.
Well, these benchmarks aren’t really very indicative. The I/O benchmark is, well, I/O bound, which is why interpreted Python performed as fast as compiled C++. The numeric benchmarks are just that, numeric benchmarks. Numerics are really the best case for an optimizer, because they are so low level. All the JIT compilers should have compiled the loop once to native code, and gotten out of the way. This is fine if all you are doing is inner-loop numeric code (some scientific stuff, graphics) but not really a good indicator of general performance. Even for scientific code, this benchmark probably isn’t representative, because you often need proper mathematical semantics for your calculations, which C/C++/Java/C# don’t provide.
A more telling test would be to get higher-level language features involved. Test virtual C++ function calls vs Java method calls (which are automatically virtual). Test the speed of memory allocation. Test the speed of iterators in various languages. Do an abstraction benchmark (like Stepanov for C++) to test how well the compiler optimizes-away abstraction.
@Brian: I can tell you how a Common Lisp result of the same benchmark would turn out. Given proper type declarations, and a good compiler (SBCL, CMUCL), you will get arbitrarily close to C++ for this task. The compiler should generate more or less the same code. See this thread for some good numbers:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=87n0t236…
Note that cmucl is very competitive with gcc. Intel C++ blew both cmucl and gcc away, but that has nothing to do with the language. Intel C++ has an auto-vectorizer that will automatically generate SSE code if it finds algorithms (like the dot-prod and scale in this benchmark) that can be vectorized. GCC and CMUCL don’t support his feature.
Interestingly, there is evidence that Lisp performs extremely well for large programs:
See this links:
http://www.flownet.com/gat/papers/lisp-java.pdf
http://www.norvig.com/java-lisp.html
In the study, the fastest programs were C++, but the average of the Lisp programs was faster than the average of the C++ programs. The Java stats on the study are a bit outdated, because it was done with JDK 1.2.
Ok, I am an idiot for not seeing the sources (OTOH, I would usually expect to find them at the *END* of the article)
For VB’s file routines, it is no wonder they are so slow. You are, as I suspected, using the VB file routines. Just to give you an idea, here is what happens EVER TIME you call PrintLine:
1) An object [1] array is created (PrintLine takes a param array). This requires an allocation, and then requires copying to the array. Given the number of items you write, you will trigger quite a few GCs
2) The VB runtime must walk the stack to find out what assembly you are calling from. This requires quite a bit of reflection, and involves acquiring a few locks. This is done to prevent conflicts between assemblies.
3) The VB runtime must find which filestream is refered to by the specified handle
4) Stream.WriteLine is called.
Well, its no small wonder it is so slow… Something similar may be happening for J#.NET
I would suggest you consider rewriting the VB file IO routines, and resubmitting your data.
As well, you should be aware that you are putting C++/C at a HUGE advantage in the IO tests. In the read portion of the test, you do the following in
while (i++ < ioMax) {
myLine = streamReader.ReadLine();
}
While in C you do:
char readLine[100];
stream = fopen(“C:\TestGcc.txt”, “r”);
i = 0;
while (i++ < ioMax)
{
fgets(readLine, 100, stream);
}
This is very much unfair to the C# language. You are forcing it to allocate a new buffer for every line that is read. C is not forced to allocate the new array, which saves it alot of time. If you would like to make the test fair, please use the StreamReader.ReadChars method to read to a char [] buffer. This will prevent the repeated allocations, which should make the test more fair. A similar technique should be used for the other languages.
Really, you should have posted these items for review before claiming that you had a fair benchmark. Really, the article should have been split into two postings, the first two sections in one, and then after commenting, the third. I would also encourge OSNews to not post benchmarks that have not obtained peer review.
So, Rayiner has is right. These benchmarks are mostly testing parts of the operating system, not necessarly the runtimes all that much. That’s why the I/O scores are all so close. The only anomaly here is that Java is probably using strict IEEE arithmetic for the trig stuff, which is why it’s so slow. I think another poster mentioned how to turn that off.
It’s these kinds of benchmarks that I get nervous about when people start saying “Java is just as fast as C.” Well, I’m a Java programmer, and I love the language, and it really has gotten a LOT faster over the last few years, but there are some things in Java that are just inherently difficult to optimize away. I’m talking about things like array bounds checking, every data structure being a reference to an object, GC, and all data types being signed (try working a lot with byte values in Java and see how much ANDing you end up doing to combat sign extension issues). These structural choices in the language design cause extra overhead that C programs just don’t suffer. Now, those are also some of Java’s greatest strengths in terms of making programming safer, but they do have a price. Fancy VMs can reduce that price, sometimes to zero for certain sections of code (the latest Hotspot does an excellent job getting rid of array bounds checking in many instances), but it’s asymtotic.
Now, here’s what I think: for most types of programs, C or Java are just fine, particularly given today’s fast CPUs and spacious memory supplies. Both of those favor Java and tend to make the difference small for anything real-world. Even the purely interpreted languages like Perl or Python are fine for all but the heaviest workloads.
… people never include Delphi in any benchmarks? It’s always C, C++, Java etc … but I never see any of these benchmarks with Delphi included.
I believe that -fomit-frame-pointer is not enabled at any optimization, therefore it’s curious that it’s specifically enabled for Visual C++, but not for gnu. Also, there’s the potential that better performance would result from -O2, and then selectively optimizing from there, rather than going all the way to -O3.
Someone said that python guys would’ve used C libraries or they would’ve written their own c routines, if that is true, maybe in Java I would used JNI or in VB I would called a C DLL or I would even used assembler methods in C (not sure if any of those methods are faster, just making my point).
This guy just showed results from code that many people who are not experts in a specific programming language would’ve code. And one must recognize that kind of code is awfuly common, So I think this benchmark is valid, in certain environments and cases. It is up to reader to be clever enough to understand that.
>>Python is interpreted and it wasn’t left out.
>Actually… if you read the posting, he compiled the >python code with Psyco. Also python is compiled into byte >code at runtime for fast execution too. Perl does not >behave like this nor can you compile it. So my commet >remains.
>Mike
But he also tried straight uncompiled python. I thought Perl was intepreted like Python?
Hello,
some years ago I’ve found a site whith a collection of identical bench for every progamming language, one for forth, one for c, one for c++…
the results were organised in a chart viewing all the spec, the machine, the system, the compiler…
I wanted to send the url to this guy but I’m not able to find it again….
(there’were some for perl and prolog for example)
if someone could send me the url, it would be nice
Cheers,
Djamé
>Last of all, I’d like to draw the author’s attention to >the .Net framework EULAs… It is in fact a violation of >the EULA to produce benchmarks of this sort of .Net >against other platforms. Which is why they haven’t been >done all over the place by now
Who cares, the worst they can do is send a letter to the author to stop by which time the benchmarks are out. After that someone else can take over, etc.
I’m talking about things like array bounds checking, every data structure being a reference to an object, GC
———-
These three are not necessarily that bad. Analysis shows that most bound checks can be eliminated ahead of time. I’m sure the Java compiler does this optimization. On the other hand, every data structure being a reference is something that is slow in Java, but doesn’t need to be.
You see, the primitive/class distinction in Java is largely unnecessary. It is entirely possible for a powerful compiler to determine what should be boxed and what should not. Powerful CL/Scheme/Dylan/ML/Smalltalk compilers do such analysis. So in these languages, there are no primitive types. Everything seems to be a full object on the heap. The compiler will take care of doing things like stack-allocating variables when no references to it escape the function, or unbox an object when it can be determined that it is safe to do so.
It would seem that the vb code DOES not use filestream for the file IO like C#. But that he was using the old method of file manipulation which would be alot slower. Try using a System.IO.File and a System.IO.StreamReader the performance will be closer to that of C#.
Good Job, interesting results.
But, as someone else said, on these small programs,
I’m not sure the Java HotSpot compiler/profiler even kicked in. I think the Java JIT only profiles and compiles after it’s sure the job isn’t going to end soon?
Interesting article Christopher! It was quite informative to see MS VC++ at the top of the speed marks, and also interesting to see the file IO in Python didn’t seem to show much significant difference with many of the other languages.
So, I’ll be the one to mention it: would it be possible to provide a benchmark for Python using the Numeric/NumArray libraries? These were written specifically for numerical operations (the benchmarks you used were bread and butter to Numeric), and they do provide a speed boost. I should imagine that the results still wouldn’t approach the fastest languages here, but it would probably improve the performance, possibly even faster than the Psyco compiled Python.
Or maybe you should include some Fortran/Forth too? (nag! nag!)
You’re looking for Doug’s Great Computer Language Shootout
http://www.bagley.org/~doug/shootout/
Its a bit outdated, and some of the code isn’t well-written (I know a lot of people on c.l.lisp complained about sub-optimal CL code) but its overall pretty good.
I would actually be more interested in how many lines/characters of code he has to write to achieve in each language. Then let a third party read the code, and say which one was more “readable”.
Rambus,
Your assertion that “This guy just showed results from code that many people who are not experts in a specific programming language would’ve code. And one must recognize that kind of code is awfuly common, So I think this benchmark is valid, in certain environments and cases. It is up to reader to be clever enough to understand that” is incorrect.
When you compare one language against another, it is not fair to take effort to optimize one language (like C) and not take time to optimize another (VB).
However, the real issue is this: the author is attempting to compare how the IO in C# stacks up to the IO in VB (in the IO test). He ended up, on the other hand, comparing how fast reflection is! A benchmark should generally be optimized as much as possible. Benchmarks are meant to stimulate real life, high expense computations. In such a situation, people do not just write `common code’. They profile code, make it faster, and profile some more. This benchmark is not representative of such a situation, and thus is not valid for its intended purpose.
Here is the output of wc -l on his code. Although it is pretty useless IMO, it would be more interesting on a more complex program with +100 classes.
Benchmark.py 160
Benchmark.c 180
Benchmark.cpp 181
Benchmark.vb 186
Benchmark.java 211
Benchmark.jsl 212
Benchmark.cs 215
I appreciate all the comments–keep them coming! I’ll try to write up a response to the major (or recurring) points later tonight. — Chris
I would really like to see Watcom C, Intel C and Perl added.
Also the thing about IBM’s JDK and Kaffe would be very interesting. Maybe (GNU) Ada could also be added. I think GCJ would have support for basic math, could be very interesting also! I would really like to see more!
The c++ compiler compiles its code before it is run, and Java while it is being run. In other words, Java actually knows more about how the code is used, which in theory should let it reach better performance than c++. In real life though, its just recently (the last couple of years) that Java has actually approached (and in some cases passed) c++.
And again, in performance critical code (for which you probably wouldn’t be implementing in Java in the first place) you can always compile your C/C++ code using profile guided optimizations, which allow a process to be run and a report generated at run-time of how the code could be better optimized, at which point the code can be fed through the compiler again. Depending on how long it takes your codebase to compile (obviously it will be quite a pain of that’s in the several hour range), this really is a trivial process, especially before the final gold master release of a particular piece of software.
The only cases where Java consistently outperforms native code compiled with profile guided optimizations (which allow for runtime optimization of compiled languages) are cases where a large number of virtual methods are used in C++ code. Java can inline virtual methods at run-time, whereas in C++ a vptr table lookup is incurred at the invocation of any virtual method.
Of course, the poor performance of virtual methods is usually pounded into the heads of C++ programmers during introductory classes (at least it was for me). If you are using virtual methods inside loops with large numbers of iterations, you are doing something wrong. In such cases, reimplementing with templates will solve the performance issues.
I would also really like to see all the same programs run on a AMD CPU (not to see if AMD beats Intel), but to see how each compiler generates code, which is generally efficient or just on one CPU.
@Raynier
You see, the primitive/class distinction in Java is largely unnecessary. It is entirely possible for a powerful compiler to determine what should be boxed and what should not. Powerful CL/Scheme/Dylan/ML/Smalltalk compilers do such analysis. So in these languages, there are no primitive types. Everything seems to be a full object on the heap. The compiler will take care of doing things like stack-allocating variables when no references to it escape the function, or unbox an object when it can be determined that it is safe to do so.
You can add C# into that mix too.
@LinuxBuddy
One of my biggest pet peeves about java has always been no unsigned. It might not seem like a big deal to a lot of people but for what I was doing, I ended up doing a lot bit-masking to get things done, as you stated. C# has unsigneds. C# also has the auto-boxing as Raynier mentioned.
I would like to see comparison between MS c# and Java(I guess you could add Mono and Pnet into the mix too) to see how well they optimize out bounds checking and also see what kind of a performance hit each takes from it.
I compiled the C and C++ benchmarks on an iBook G4 800MHz. I used the best possible optimization. I edited out the IO test because it was giving me a “Bus error” after creating a 70Mb file. Weird.
C (gcc -fast -mcpu=7450 -o Benchmark Benchmark.c)
Integer: 8.8s
Double: 17.2s
Long: 56.2s
Trig: 12.0s
C++ (g++ -fast -mcpu=7450 -o BenchmarkCPP Benchmark.cpp)
Integer: 8.7s
Double: 16.9s
Long: N/A
Trig: 12.0s
(I wag getting a “integer constant is too large for “long” type” warning, so I left it out)
I didn’t have the patience to wait for the Python program to complete.
Numerics are really the best case for an optimizer, because they are so low level. All the JIT compilers should have compiled the loop once to native code, and gotten out of the way. This is fine if all you are doing is inner-loop numeric code (some scientific stuff, graphics) but not really a good indicator of general performance.
Yes, this benchmark could really give people the wrong idea about Java. Obviously HotSpot is doing its job, and it performs comparable to native code.
Even for scientific code, this benchmark probably isn’t representative, because you often need proper mathematical semantics for your calculations, which C/C++/Java/C# don’t provide.
Fortran is a wonderful language for the scientific community, not only for its language semantics but also its optimization potential. While this potential is not fully realized on most platforms (the Alpha is the only ISA where the Fortran compiler has been optimized to the point that for scientific computing Fortran is the clear performance winner over C) Fortran does have a distinct advantage in that many mathematical operations which work as library functions in languages like C (i.e. exponents, imaginary numbers) are part of the language syntax in Fortran, and thus complex mathematical expressions involving things like exponents can be highly optimized, as opposed to C where a function invocation is required to perform exponentation. Algorithms for doing things like modular exponents can be applied to cases where they are found in the language syntax and be applied to the code at compile time, whereas C requires programmers to implement these sort of optimizations themselves. With more and more processors getting vector units, a language which allows such units to be used effectively really is in order.
Java really dropped the ball on mathematical code. C at least has a rationale behind why things like exp() and log() are functions rather than part of the language syntax. C is designed to be a language with a relatively simple mapping between language syntax and processor features. Java/.NET could have made exponentation a language feature rather than a library function… after all, they certainly aren’t bound by the limitations of processors. Instead we find these sorts of things in java.lang.Math and System.Math because they are clinging to C/C++’s legacy rather than thinking about the rationale behind the C language syntax and how the syntax could be better designed when a simple mapping between processor features and language syntax isn’t required.
Lack of operator overloading is the biggest drawback to mathetmatical code in Java. Complex mathematical expressions, which are hard enough to read with conventional syntax, become completely indecipherable when method calls are used in their stead. I had the unfortunate expreience of working on a Java application to process the output of our atmospheric model, which is an experience I would never like to repeat. Working with a number of former Fortran programmers, everyone grew quickly disgusted with the difficulty of analyzing matrix math as method calls, and were quite amazed when I told them that with C++ operator overloading such code could be written with conventional mathematical syntax (although there are some issues differentiating dot products from cross products, but it’s still much less ugly than method invocations)
A more telling test would be to get higher-level language features involved. Test virtual C++ function calls vs Java method calls (which are automatically virtual).
Java will almost certainly win on the speed of virtual methods because it can inline them at run-time. Again, the solution in C++ is not to use virtual methods within performace critical portions of the code, especially within large loops, and the simple solution is to replace such uses with templates where applicable.
His C++ is straight C. Iy even uses printf instead of cout… Where are the classes? Where is the standard c++ library usage??
Hmmm….
One of my biggest pet peeves about java has always been no unsigned. It might not seem like a big deal to a lot of people but for what I was doing, I ended up doing a lot bit-masking to get things done, as you stated. C# has unsigneds.
Agreed. One of the first things I ever wrote in Java (about 9 years ago) was an implementation of IDEA, and I quickly learned why lack of unsigned types was a bad thing. I ended up using signed 32-bit integers to emulate unsigned 16-bit integers, and of course this was done in conjunction with a great deal of masking. This revealed to me one of the many hacks which were thrown into the Java syntax, the unsigned shift operator >>>. Sun, wouldn’t it have been simpler to support unsigned types?
I don’t know if auto-boxing is really the same thing. If it is, then why are there structs in C#? And why is there a distinction between allocating a struct on the heap vs the stack? It might be, but I’m not familiar enough with C# to make the comparison.
Of course, C#’s compiler might have such analysis. Microsoft has some smart compiler guys. They’re not very innovative, but they’ve got their fingers in some nifty pies. But I hear C# 2.0 will get lambdas with proper closures! At that point, it would be cool to do a benchmark to see how good their closure-elimination optimizations are compared to CL/ML compilers. Is type-inference too much to ask for in C# 3.0
I think there are a couple of issues surrounding the trig tests in the benchmark.
The first is obvious: all of the computational “heavy lifting” is being done by the run-time library. Performance differences in the code you actually wrote are likely to be unimportant.
The second is that results like this are almost meaningless unless they are accompanied by some measure of the accuracy of the result. Without going into the gory details, functions like sine and cosine are typically calculated from power series approximations. Taking fewer terms is faster, but less accurate. For example, I can write a very fast C routine to approximate the value of pi:
double pi() {
return 3.0 ;
}
The result is correct to one significant figure, after all. 😉
(Numerical accuracy is not just a theoretical concern. Early versions of Lotus 1-2-3 implemented calculation of standard deviation wrong, and consequently got the wrong answer for the set of numbers {999999, 1000000, 1000001}.)
Forth is a powerful and expecialy, very fast compiled programming language, too often forgotten.
With more and more processors getting vector units, a language which allows such units to be used effectively really is in order.
——–
APL anyone?
In addition to the advantages of Fortran you mentioned, I was also thinking about a full numeric tower like some languages have. Standard machine integers and floats are nice for a lot of scientific computing (and accounting, as I’m told), but for some computations, you need things like infinite precision rationals, arbitrary precision integers, etc.
I bet ocaml would kick all their asses.
Comparing only arithmetic/math operations is far from being representative of the performancec of any language. What about the other instructions found in languages such as tests or assignations, real memory allocations, object manipulation, GUI, file system and network access, etc, etc.
Plus we all know that GCC by default generates a terrible code on Intel. It generates a very clean (and makes good use of the x86 instruction set) only in optimised mode, which was not used for the benchmark.
This benchmark could lead one to think that Java is just 2 times slower than C/C++. Something that anyone who has used a large application written in Java will know can’t be exact.
This benchmark has a very limited scope and its results are not representative of the real world.
The following code:
FileOpen(1, fileName, Microsoft.VisualBasic.OpenMode.Output)
Do While (i < ioMax)
PrintLine(1, myString)
i += 1
Loop
FileClose(1)
could be much improved by using the native .NET methods. In fact, the code should be identical to that of C#.
In addition, the C# IO code is using try..catch construct that could slow the code down. It would be good to retest the code using these suggestions.
You can grab the binaries I made here:
http://fails.org/benchmark/
These are optimized for Athlon XP/MP and will require SSE
b-gcc is compiled with gcc 3.3 with -O3 -march=athlon -msse
b-icc is compiled with icc 8.0 with -tpp6 -xiMK -O3
b-icc-opt has been optimized with Profile Guided Optimization. First, Benchmark.c was compiled with -prof_gen to create an “instrument” executable. Next, the instrument executable was executed, and a run-time profile was generated (in the form of a .dyn file). Finally, b-icc-opt itself was compiled with -prof_use -tpp6 -xiMK -O3.
Respective scores when executed on a dual Athlon MP 2.0GHz:
gcc 3.3:
Int arithmetic elapsed time: 6550 ms
Double arithmetic elapsed time: 6250 ms
Long arithmetic elapsed time: 16760 ms
Trig elapsed time: 3640 ms
I/O elapsed time: 1090 ms
Total elapsed time: 34290 ms
icc 8.0:
Int arithmetic elapsed time: 6740 ms
Double arithmetic elapsed time: 5560 ms
Long arithmetic elapsed time: 27140 ms
Trig elapsed time: 2510 ms
I/O elapsed time: 1230 ms
Total elapsed time: 43180 ms
icc 8.0 (with profile guided optimization):
Int arithmetic elapsed time: 6340 ms
Double arithmetic elapsed time: 5540 ms
Long arithmetic elapsed time: 27460 ms
Trig elapsed time: 2430 ms
I/O elapsed time: 1190 ms
Total elapsed time: 42960 ms
Ouch! Clearly icc has trouble with 64-bit math. But otherwise, icc clearly outperforms gcc 3.3 in all other respects being tested, definitively when profile guided optimization is used.
If I recall correctly, the Camal book says that Perl is also byte compiled internally before execution, like Python (and even Tcl nowadays) Benchmarks usually show Perl being a bit faster than Python, though I don’t know if Perl has an equivalent of Psycho. (The Python native compiler used in the benchmark.)
The C benchmark compiled -O2 on AthlonXP 1.4ghz. Fedora core1.
gcc version 3.3.2
gcc -o2 Benchmark.c
Int arithmetic elapsed time: 8330 ms
Double arithmetic elapsed time: 7850 ms
Long arithmetic elapsed time: 20810 ms
I/O elapsed time: 21750 ms
(I could not get the trig benchmark working, so left it out)
It’s interesting how much faster the int, double and long benchmarks are than his results…. Though the can really crunch those numbers compared to Pentium 4M 2GHz. I/O is slower.
Compiled with -O3 it gets slightly slower!
Int arithmetic elapsed time: 8320 ms
Double arithmetic elapsed time: 7860 ms
Long arithmetic elapsed time: 20840 ms
I/O elapsed time: 21850 ms
Could these better results be due to running gcc on native Linux, or is it the different processor?
Structs and primitives are value types and allocated on the stack, but they are also objects too. The compiler automatically creates an object if needed instead of you having to do it. The main benefit, obviously, you only pay for what you use. I thought you were referring to the way in java that you have to use wrapper classes for the primitives. Java, obviously, doesn’t have structs. Classes are always on the heap in C#
Tested using J2SE v 1.4.2_03 on a dual Athlon MP 2.0GHz running Linux 2.6.0.
The code was compiled with javac -g:none and executed with java -server:
Int arithmetic elapsed time: 7271 ms
Double arithmetic elapsed time: 11501 ms
Long arithmetic elapsed time: 23017 ms
Trig elapsed time: 77649 ms
IO elapsed time: 3418 ms
Total Java benchmark time: 122856 ms
Well, Java trumps icc on 64-bit math, but thoroughly loses everywhere else, especially the floating point and trig benchmarks.
The default math library is compiled with -O0 to preserve strict IEEE semantics. In fact, with minor change to the source code, -O2 will work as well. Java has two math libs, Math and StrictMath. They are default to the same implementation. But JVM is allowed to use faster/less accurate version of Math. The VC++ uses loose math (x86 trig instructions directly).
I assume you’re refering to floating point precision, Java by default follows the IEE 754 international specification, however, java has also allowed for EXTENDED PRECISION on platforms that support it.
I assume when you mean “faster/less accurate version of Math”, I assume you are refering to the standard library that is used.
Ah, I see. In Lisp/etc, there is no distinction stack-allocated primitive types and heap-allocated classes. The compiler will automatically determine where to allocate the object to maximize performance. Also, the compiler doesn’t box/unbox primitives at runtime, but decides at compile-time what objects should be boxed and which should be unboxed.
gcc -lm -O2 Benchmark.c
Int arithmetic elapsed time: 6240 ms
Double arithmetic elapsed time: 5920 ms
Long arithmetic elapsed time: 16370 ms
Trig elapsed time: 3370 ms
I/O elapsed time: 890 ms
Total elapsed time: 32790 ms
gcc -lm -O0 Benchmark.c
Int arithmetic elapsed time: 8780 ms
Double arithmetic elapsed time: 9470 ms
Long arithmetic elapsed time: 18920 ms
Trig elapsed time: 3650 ms
Total elapsed time: 41930 ms
Looks like Cygwin is a lot slower.
I only wish. I’ve been on three large C++ version 6 project.
( 100 to 300 classes ). The VC++ compiler generated broken release code. We shipped the “Debug build”.
Could not convince product manager to buy better tools or allocate any time to find the problem. Many memory leaks were from Microsoft’s MFC classes.
My point is, Java code profiling on the fly, is the better solution.
Mostly because 99% of the programmer’s out there will never get the chance to profile their code, if they even know how to do it. Manger’s won’t spend the time or money.
gcj-3.3 -O2 –main=Benchmark Benchmark.java
Int arithmetic elapsed time: 6220 ms
Double arithmetic elapsed time: 5914 ms
Long arithmetic elapsed time: 16485 ms
Trig elapsed time: 26012 ms
IO elapsed time: 10229 ms
Int, Double and Long at the same speed of GCC, io en trig a lot slower than GCC.
I’ve compiled my results into a more easy to interpret format, and drawn some different conclusions than I posted here:
http://fails.org/benchmarks.html
In reply to MikeDreamingofabetterDay…
My point is, Java code profiling on the fly, is the better solution.
The primary drawback of Java’s run-time profiling is that all optimizations are discarded when the application exits. Profiling really helps optimize code which spends most of its time executing in a small number of places within the executable. Consequently, large applications which do an elaborate amount of startup processing take an additional performance hit from run-time optimization in that the startup code will only be touched once, but the run-time’s optimization code still attempts to determine how best to optimize. Eclipse and NetBeans certainly come to mind… their start-up times are an order of magnitude worse than any others IDEs I’ve used.
Profile guided optimization, on the other hand, is a one-time process, and the optimizations are permanent to the binary, thus no performance loss is incurred.
Mostly because 99% of the programmer’s out there will never get the chance to profile their code, if they even know how to do it.
Profiling should be (and often is) an additional automated function of the unit testing process. Intel’s icc can take a number of profiles from a number of different test runs and compile the collective results (a separate .dyn file is generated for each run of the instrument executable) to determine the best way to optimize the given module when a release build is performed.
I’ve never used Microsoft Visual C++ on a large project, but your woes there are not really pertainent to the use of profile guided optimization.
Object Oriented performance is really important with oo languages. Creating and destroying objects. casting and stuff like that..
Your tests are indeed interesting, but what I think is the main point is, Java, generally speaking, doesn’t lag behind significantly! We’re not talking orders of magnitude here, it’s the same ballpark!
@Rayiner
@RoyBatty
On boxing/unboxing in Java, yes, you are right that this can be certainly be done. I belive the JDK 1.5 Hotspot is going to be doing this at some level. As I said, it isn’t the case that Java can’t go faster with better optimizations, just that such optimizations have to be done, thus adding to the complexity of the runtime.
These are language-level, structural issues that C just doesn’t have to deal with. C’s simple, “assembler with loops” sort of orientation is both a blessing and a curse. It’s a blessing when it comes to optimization as you don’t have these sorts of constraints to deal with, and frankly, the language leaves lots of implementation-dependent behavior to exploit. Java is more constrained, which eliminates broad classes of bugs that are very difficult to debug, but in return, the language exacts an overhead which the JVM compilers all seek to reduce to near-zero. Put another way, it’s a lot easier to write a passable C compiler than a passable Java VM (though very difficult to write sophisticated versions of either).
Again, I love Java. It’s my main programming language. I love its relatively small and simple language design with resemblance C (probably my next favorite language). With CPU speeds increasing and Java JVMs just getting better and better, I find myself programming almost exclusively in Java now.
mcs Benchmark.cs
Int arithmetic elapsed time: 9955 ms
Double arithmetic elapsed time: 21385 ms
Long arithmetic elapsed time: 55066 ms
Trig elapsed time: 3707 ms
IO elapsed time: 20949 ms
Total C# benchmark time: 115636 ms
“Agreed. One of the first things I ever wrote in Java (about 9 years ago) was an implementation of IDEA, and I quickly learned why lack of unsigned types was a bad thing. I ended up using signed 32-bit integers to emulate unsigned 16-bit integers….”
Characters are unsigned 16 bit quantities. They are the only unsigned types in Java. Why didn’t you use them?
Java really dropped the ball on mathematical code. C at least has a rationale behind why things like exp() and log() are functions rather than part of the language syntax. C is designed to be a language with a relatively simple mapping between language syntax and processor features. Java/.NET could have made exponentation a language feature rather than a library function… after all, they certainly aren’t bound by the limitations of processors. Instead we find these sorts of things in java.lang.Math and System.Math because they are clinging to C/C++’s legacy rather than thinking about the rationale behind the C language syntax and how the syntax could be better designed when a simple mapping between processor features and language syntax isn’t required.
You’re right from a syntax perspective, but there is no reason that speed has to suffer. A given JVM may optimize various library calls to inlined, optimal instruction sequences. This is done in some JVMs for basic java.lang classes (like String handling, etc.). Your point about not having inline operators that make your code readable is very true, however.
Visual c++ is fast on windows – is that surprizing ?
I just hope that people will not conclude that gcc is slow in general – gcc is a lot faster on Linux : for example the benchmark for c (amd1800+, linux, gcc3.3.1mdk) took a total of 54ms (41ms with -O2 -march=athlon-xp).
Ah, I see. In Lisp/etc, there is no distinction stack-allocated primitive types and heap-allocated classes. The compiler will automatically determine where to allocate the object to maximize performance. Also, the compiler doesn’t box/unbox primitives at runtime, but decides at compile-time what objects should be boxed and which should be unboxed.
Right. In most every Lisp implementation, every value travels along with its type. Typically, a few of the low-order bits are used to encode the type. There is no real distinction between “primitive type” versus other types when it comes to function calls, etc.
Hotspot is heavily optimized for Solaris-sparc, being Sun’s flagship platform and all. GCC is targetted towards x86 mostly (although I will stop short of saying it is heavily optimized, because honestly, it isn’t).
Compare the same benchmarks on a Solaris-sparc system, especially a large-scale system, and you might find some very interesting results.
Right. In most every Lisp implementation, every value travels along with its type. Typically, a few of the low-order bits are used to encode the type.
————–
This isn’t necessarily correct. In the general case, every object has a header describing its type, just like Java/C# classes. However, there are a number of optimizations to this general case.
– Some implementations store certain special types (integers, cons cells, etc) right in the pointer word, with some bits reserved as a type tag.
– Some implementations don’t bother with tag bits, and instead use an analysis that determines when an object doesn’t need to be a full object. For example, when you use an integer as a loop counter, you can just use a regular (untagged) machine word.
– Some implementations support type specialization, and generate type-specialized versions of functions, like C++ templates do.
Thus, even though the programmer always deals with objects, the generated machine code will often deal directly with machine types. So its not strictly true that every value travels with its type. For the numeric benchmarks in these articles, for example, the machine code would would deal with regular floats.
I have put on line the famous unixbench from the magazine byte :
you have to tweak the makefile in order to get the best optimisation. For myself : I put that :
OPTON = -O3 -fomit-frame-pointer -fforce-addr -fforce-mem -ffast-math
-march=i686 -mcpu=i686 -pipe -malign-loops=2 -malign-jumps=2 -malign-functions=2
it can be found at http://www.loria.fr/~seddah/bench.tar.gz
I know this is a bit off-topics but I would like to see results from various system, I will put mine in this forum for a celeron 450 running mandrake 8.0 and for a powerbook (old one) running linuxppc2K…..
if someone could make this bench run under mac os X, it would be great
Cheers
Djamé
I’m curious to know whether anyone has checked out the Java 1.5 Alpha?
http://java.sun.com/developer/earlyAccess/j2sdk150_alpha/
Just one point, Java long startup time is caused by Java doing what most languages don’t: class verification. In essesnce, a scan of the class files to be sure they haven’t been hacked.
Again, if you “get” Java you put up with the “time” issue for the benefit you get from the language. The Java Security model and the productivity of it’s huge class library.
The trig test is pointless, Windows libraries aren’t compiled with gcc. So is the I/O test.
I am astonished that c# performs so well though.
Your code does not test for successful completion and
accurate results!
All that is needed to win your benchmark is a library
like this:
double tan (double x) {return 1;}
double sin (double x) {return 1;}
double cos (double x) {return 1;}
and so forth. What is the value of fast but wrong
answers?
testing gcc under cygwin and against windows libraries isnt actually fair is it? You test Visual C++, which is quite good native, why not test gcc under a POSIX-environment, for example linux, too? Perhaps Testing Visual C++ under Linux with wine should be done too?
Of course if you write crappy code in VB.net and good code in C# that does the same thing, yes you will get different results. The point is if you write similar code that takes advantage of a particular .net language you are going to get almost identical results, which is what this benchmark reported.
Understand….
He left out three of the best languages – DELPHI, EUPHORIA and Assembly. Believe me they are really fast as hell – Especially DELPHI.
Can anyone give some benchmarks with these three languages?
How not to write a benchmark…. Your code does not test for successful completion and accurate results!
(…much like the real Paris Hilton, a basic conceptual understanding is present but a knowledge of details is lacking…)
As long as you’re using standard runtimes or linking against standard libm’s, there really isn’t going to be a problem.
Attempting to check the results may be especially problematic in certain areas, due to floating point round off error, unless you’re doing all your testing on platforms with IEEE floats.