Extended C++ (Intel, MS), Java(1.4, 1.5) and C# Benchmarks

Submitted by Thomas Bruckschlegel 2004-01-15 Benchmarks 36 Comments

Thomas Bruckschlegel extended the language benchmarks from our previous article with some other benchmarks. Check it out here. Bascule did so too.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

36 Comments

2004-01-15 2:07 am
Anonymous
I understand it’s 1.5 alpha, but still…
They keep screwing Java performance up.
1.5 is slower on every benchmark than 1.4.2_03!
Which is in turn is slower than 1.3.1_09 3 times!
Well, with this attitude, C# and Microsoft have a very good
chance to capture majority of the market.
2004-01-15 2:11 am
Anonymous
I/O
gcc 3.3 1090 ms
Java 1.4 3418 ms
“Here Java performs three times worse on what should be an I/O bound operation”
Java is 1/3 the speed of GCC 3.3 C++, java is sloooow.
2004-01-15 2:12 am
Anonymous
One of the things Java 1.5 will improve is String concatenation. The code tested by the Thomas Bruckschlegel (first link) in Java 1.5 is using the StringBuffer class. Java 1.5 includes a new class named StringBuilder, which is very similar to StringBuffer but is not synchronized. It will substitute StringBuffer in the code generated when concatenating strings with the “+” operator, and the code is expected to be much faster. I can’t give this hint to the author ’cause the site is in an unknown language for me. german?. Maybe someone can give this hint to the autor to test the new StringBuilder class. Also will be good to test new features in Java 1.5 like generic collections and autoboxing to see how it compares to Java 1.4.
2004-01-15 2:14 am
Anonymous
They didn’t even benchmarck C?!? It would have smoked them all.
2004-01-15 2:15 am
Anonymous
I think the original benchmark is bogus. “-mno-cygwin” stopped working for a long time. On Windows one has to use mingw to really test gcc-generated code.
2004-01-15 2:25 am
Anonymous
Maybe but who cares about c these days
2004-01-15 2:49 am
Anonymous
Vector is synchronized, ArrayList is not. HashMap was used over Hashtable for this reason…
Also someone should do a native Java trig benchmark, since the other languages are doing native floating point (take the source to StrictMath and remove all instances of the “strictfp” keyword). Math delegates to StrictMath in 1.4 for cross-platform consistency.
2004-01-15 2:56 am
Anonymous
What should be tested in Java is:
System.arraycopy() – used everywhere
Synchronous I/O – Sockets and File
Asynchronous I/O – Sockets and File
Hash Map
Tree Map
List
Vector
Reflection
RMI
Memory performance
The above are critical as they constitute 99% of Java based
software
2004-01-15 2:57 am
Anonymous
Operating system developers
Library developers
Compiler developers
Embedded system developers
Toolkit developers
Game developers
Application developers
Hardware/Firmware developers
Graphics engine developers
And me, to mention a few.
2004-01-15 3:31 am
Anonymous
I think that 64-bit integer test isn’t really fair. My guess is that ICC is using SSE2 to do the calculations, and GCC is using 3DNow!, which is much faster on the Athlons than SSE2 (for obvious reasons). Just out of interest, I would love to see the GCC results if all these tests were run after being compiled for 64 bit instructions, to see what difference it would make in some of the other tests since that would turn on the extra registers and such. It would be interesting to see how that changes GCC’s results, although in a whay I guess that’s more of a CPU test.
2004-01-15 3:55 am
Anonymous
Ok, maybe I am stupid for asking this, but everytime I try to compile the c++ code is MS VS .NET, I get errors about ‘long’ followed by ‘long’ being illegal. What am I doing wrong?
2004-01-15 3:59 am
Anonymous
Can someone out there do some real world benchmarking???
In my world, C#, VB, and C++ is used with ASP to deliver business solutions. We use a database (Oracle) on the back end. Trig don’t enter into the picture and neither does most math. What really counts is String manipulation and component communication.
For example, I recently had to rewrite a VB.Net COM+ component. Nothing fancy here. The test harness (written in java) caused the web pages out of memory due to the .Net runtime. I ported the application to serverside JavaScript. The end result – the web pages could keep up with the test harness, the system used 1/10 of the memory to run, and it ran 3 times faster.
How on earth would I be able to tell that JavaScript would out perform VB.Net based on your benchmark????
2004-01-15 4:23 am
Anonymous
Actually, C wouldn’t make any difference in these tests. The C++ code in these benchmarks was essentially C code. The only C++-ism used anywhere was the vector<> template used in the vector test.
Besides, the abstraction-penalty for C++ vs C is almost nil, because C++ is explicitly designed to have zero overhead over C code doing the same thing. For example, virtual calls in C++ generate the same code as using a table of function pointers in C, and you’d only incur the virtual call overhead in cases where you’d have to use a template of function pointers in C.
C++ can actually be quite a bit faster for generic containers and algorithms. For example, std::sort() is a lot faster than qsort(), because the former uses an inlineable function object, while the latter uses a non-inlineable function pointer. Also, for things like generic trees, the template mechanism will generate optimized code tuned to each type, while the C version will have to either maintain different tree code for different types, or use slow pointer arithmatic.
Google for the “Stepanov benchmark” to show how little abstraction costs in C++.
2004-01-15 4:41 am
Anonymous
I am not 100% sure but I think SSE2 is 64 bits-floating point only. MMX is what has 64 bits integer. 3DNOw is also floating point.
2004-01-15 5:11 am
Anonymous
can you turnoff strict in code?
2004-01-15 5:34 am
Anonymous
replace “long long” with __int64 if you are using MSVC instead of gcc.
2004-01-15 5:41 am
Anonymous
SSE2 is 128-bit floating point. Its either 4 single-precision (32-bit) floats or 2 double-precision (64-bit) floats. 3DNow! is 2 single-precision floats. MMX is 8, 4, or 2 8-bit, 16-bit, or 32-bit integers.
2004-01-15 5:45 am
Anonymous
http://java.sun.com/j2se/1.4.2/docs/tooldocs/solaris/java.html
I’d try java -XBatch – server switches for the java benchmarks,
to be sure the JVM has actually finished compiling the code before running it.
Additionally, you could cheat a little by turning Garbage Collection off:
– Xnoclassgc which might help a bit.
– I second the switch to ArrayList, no one uses Vector any more, in single threaded apps.
2004-01-15 5:51 am
Anonymous
http://gcc.gnu.org/ml/java/2002-02/msg00151.html
That’s the problem with benchmarks, you’re comparing apples to oranges.
However, Java.lang.math vs. java.lang.StrictMath:
The java.lang.math class still doesn’t seem to be optimized,
and falls back on the StrictMath implementation, according to the doc.
2004-01-15 6:36 am
Anonymous
I have done some modifications to the sources to compare Java 1.4 and Java 1.5, also to compare the default implementation with optimized implementations. The results are the following:
Instead of using Vector class, which is synchronized, I changed the Vector class with LinkedList and ArrayList. One of the big things about Java is the number of choices. In this case, the code is basically a lot of inserts and removes at the end and at the beggining of the lists. LinkedList if super fast in this opperations. The implementation using LinkedList finished in less than 15%!!!, that’s around 7 times faster!!!! of the time of Vector and ArrayList. ArrayList performs better than Vector, but with very little difference. Maybe if more opperations are performed, ArrayList could show a mayor difference. Also I tested ArrayList<Integer> on Java 1.5, in this case the method performed worst than Vector, but with very little difference.
Using LinkedList: 4706ms
Using ArrayList: 34850ms
Using Vector: 35611ms
Using ArrayList<Integer>: 36583ms
Trigonometric methods can’t be optimized in Java. Contrary to the folks who said that is better to use StrictMath instead of Math, this is false. According to the API, StrictMath is implemented to generate the same results in every platform, and all methods are implemented in pure Java. Math is the class where the VM implemtantation can use the native resources of the platform. So is correct to use Math in the Benchmark class. The problem is that the Sun implementation of Math do not use any native code, and all methods of Math call the same methods of StrictMath, that’s why trigonometric functions in Java are worst compared to other languages if using Sun Java VM. Maybe with other Java VMs the result could be better.
The only good result is that in my tests Java 1.5 performed a bit faster than Java 1.4
Java 1.4: 48871ms
Java 1.5: 41590ms
I compared StringBuffer concatenation used in Java 1.4 and before, with the upcoming StringBuilder which will be used in non synchronized cases to concatenate Strings in Java 1.5. The Builder performed a little better than the Buffer, similar to Vector and ArrayList. The good news here is that StringBuilder will be used instead of StringBuffer by compilers when generating code for expressions like “hello”+” “+”world”, so every code recompiled with java 1.5 that uses String concatenations will get better.
On the other areas tested by this benchmark there is not much difference between Java 1.4 and Java 1.5. Sometimes Java 1.5 wins, sometimes Java 1.4. The tested Java 1.5 is “alpha” code, so there is a big chance the final version will be much faster.
The other thing updated in my code is that the expression:
startTime = (new Date()).getTime();
was changed by:
startTime = System.currentTimeMillis();
which is much more simpler and faster than the original. (in fact, in fact the default constructor of Date have to call System.currentTimeMillis() to set the current time).
I will try to compare also with GCJ to see the differences.
I have no site to post my modifications to the java code and my results, so if someone wants it, I can send it by mail.
2004-01-15 7:43 am
Anonymous
It would ne interesting to see how fast c++, java and c# can create objects, for example testing how long it takes to create 1000 Node objects and insert them in a list or somehthing similar…
2004-01-15 8:07 am
Anonymous
@ bagdadbob:
> In my world, C#, VB, and C++ is used with ASP to deliver
> business solutions.
Your world might not be *the* world, at least not all of it.
I have used VB (pre-.NET) to write an enterprise-scale, 3-tier banking application doing serious number crunching. (Yes, it’s not the best language to the problem, but it was being used for ages, and did perform quite well given the circumstances. Has been replaced with Java by now.)
I have used C++ to write an equally-scaled big-iron data archiving / processing / providing application (making a Sun E10k max out on the I/O), including quite some FP operations in high-precision requirements.
> What really counts is String manipulation and component
> communication.
In *your* world. In *my* world, we use Perl for the strings, and I have yet to write up a COM/Corba/whatever component after > 5 years in the business.
> How on earth would I be able to tell that JavaScript
> would out perform VB.Net based on your benchmark????
Oh sweet bejezus… if you want to *benchmark* JavaScript, you have a serious design issue there.
2004-01-15 8:09 am
Anonymous
They didn’t even benchmarck C?!? It would have smoked them all.
Both the gcc and icc benchmarks I conducted were of the C code.
2004-01-15 8:14 am
Anonymous
The Java benchmarks run on the 1st linked page don’t make use of the Server VM. This is well known to be much better optimizing than the Client VM and for performance critical apps, the -server VM should be used. But I suppose he wanted to run with the client VM given that we already saw the performance of the server Vm in the previous article.
Bascule should have tried IBM’s Java SDK. It provides much better performance. This is available for Linux and can be downloaded at http://www-106.ibm.com/developerworks/java/jdk/linux140/
Osvaldo Doederlein provides a rather good explanation on Javalobby why Java’s math implementation sucks majorly. http://www.javalobby.org/thread.jspa?messageID=91784613&threadID=10… . Basically, its because it not only does everthing in software (FPU is completely unused), it uses some accurate by slow algorithms too.
2004-01-15 9:09 am
Anonymous
I would say that they pretty much DID test C. Only with a c++ compiler. I wonder how the IO tests would have turned out, had they used iostreams instead of stdio.
2004-01-15 10:56 am
Anonymous
There’s only so many ways you can compile arithmetic and a loop (ADD EDX, BLAH ; JGE WHEREVER). The only real difference you will see is when bad compilers are constantly fetching and storing to the stack frame when you could keep values in registers, or if there is constant checking overhead (Java etc). I/O is pretty much down to the operating system, all you really have to do is a system call.
2004-01-15 11:11 am
Anonymous
They are still not comparing the compilers equally. In order to compare icc vc gcc, you _always_ need the -fomit-framepointer and -ffast-math.
Gcc usually reserves the framepointer to give better backtraces, which icc doesnt. And icc by _default_ does non-standard FP-math, where gcc be default does accurate complient FP-math.
2004-01-15 11:24 am
Anonymous
I agree. I always use g++ with -O2 -fomit-framepointer -march=name_of_cpu and with these options g++ sometimes beats the Intel Compiler (on linux) or your performance is almost equal.
2004-01-15 11:58 am
Anonymous
Why no run the test with IBMs JDK 1.4.1 or even 1.3.1. Its much faster than Sun’s in most cases and especially for those trigonometric functions.
2004-01-15 12:42 pm
Anonymous
Solar:
In *your* world. In *my* world, we use Perl for the strings, and I have yet to write up a COM/Corba/whatever component after > 5 years in the business.
My world has spanned teletecommunications, rail traffic control, call center, middleware solutions, database solutions, accounting and business banking. Spanning many languages and platforms. For 20+ years in business. I have yet to have see a case for trig. I have yet to see a case for “serious number crunching”, like I had done in university.
I have used VB (pre-.NET) to write an enterprise-scale, 3-tier banking application doing serious number crunching.
Hmmm… a 3-tier VB solution without using COM?!?!?!?! You write your own socket routines 😉
Oh sweet bejezus… if you want to *benchmark* JavaScript, you have a serious design issue there.
No, what is flawed is the assumption that .Net must outperform JavaScript, hence your bewilderment. I only took a stab at this solution becuase of the similarity of the syntax. It knocked my socks off when I tried it.
Like I said the VB.Net component was a no brainer. The code followed Microsoft’s documented solution to the letter. It simply ran canned queries using ADODB.Net and returned the results. When a test harness was applied to the component itself, there did not appear to be a loss in memory, but when the test harness was let loose a web page which called the component, memory was gobbled up like crazy.
2004-01-15 1:04 pm
Anonymous
Sorry, I’am not that much experienced with Java. What do people mean by Server VM? The J2EE 1.4 instead of the J2SE 1.4?
2004-01-15 2:14 pm
Anonymous
The Java runtime comes with 2 VMs. The client and server. The client VM is the default VM and it is optimized for fast startup times. To achieve this, it doesn’t perform many optimizations. Check the Sun docs to see what.
On the other hand, the Server VM is meant for programs that run for a long time, where startup time is inconsequential. Thus the server VM is able to perform much more optimizations compared to the client VM. Its not surprising for the server VM to have much better performance.
To invoke the server VM just add the -server flag to your java command.
2004-01-15 3:44 pm
Anonymous
Trigonometric methods can’t be optimized in Java.
You could create a new FastMath class that’s a copy of the StrictMath class with the strictfp keyword removed. That’s what someone would do if they needed fast trig performance in Java 1.4 and didn’t care about strict IEEE compliance. It’s too bad the Java API doesn’t offer that anymore.
Also, the Vector benchmark really shouldn’t be testing object creation … you should create one Integer object and insert that object everywhere in the list. Test object creation in another benchmark..
2004-01-15 6:19 pm
Anonymous
You could create a new FastMath class that’s a copy of the StrictMath class with the strictfp keyword removed. That’s what someone would do if they needed fast trig performance in Java 1.4 and didn’t care about strict IEEE compliance. It’s too bad the Java API doesn’t offer that anymore.
There is a better solution: implement the Math methods using native system libraries or even in Java using less sofisticated and non-exact algorithms, as the API dictates. But this is a job for the VM implementor, not for the java programmer. The best the java programmer can do is to use Math instead of StrictMath. When Sun implements Math using faster algorithms instead of calling StrictMath methods, or you use other VM with faster Math implementation, you will get faster code.
2004-01-15 7:18 pm
Anonymous
Well, then, the course of action is obvious. As concerned OSNews readers, we need to set ourselves the goal, before the decade is out, of implementing a Java VM and returning it safely to the Earth. I propose that we name it “My First Trigonometrically Optimal Java VM Studio .NET”.
2004-01-15 10:08 pm
Anonymous
Most comments for every published benchmark about software or hardware are always devoted to criticize the method. Anyway this benchmark seems really too simple. A comment by _mikk above seems very accurate about what programs written in Java (to name one of the tested languages/compilers) must be good performers.
Here http://www.bagley.org/~doug/shootout/index2.shtml is The Great Computer Language Shootout, a very complete bechmark suite with many different programs to test different performance areas. It was continued: http://dada.perl.it/shootout for Windows only. The tests are already written and published. Why not use them?. Or at least select some of them.