Native Code Sharing in Java 1.5

Submitted by Ruediger Klaehn 2004-01-29 Java 25 Comments

The respected German computer news site heise.de reports that the next version of java (java 1.5) will have a mechanism to share native code
between multiple virtual machines. The feature, which is called class data sharing, will improve start times and reduce memory consumption for people who run multiple java VMs at the same time. This is especially important for client side java programs which use large gui libraries such as Swing.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

25 Comments

2004-01-29 1:01 am
Anonymous
They integrated the code from Mac osx!!!!
In case you didn’t know Mac OSX already does this.
2004-01-29 1:11 am
Anonymous
Right!
Here is Apple’s site that describes the technologie:
http://developer.apple.com/documentation/Java/Conceptual/Java141Dev…
2004-01-29 1:17 am
Anonymous
Now if only Java included mechanisms to cache the run-time optimizations it generates. It’s ridiculous that each time you start the JVM, it must once again optimize every routine in the Java class library that’s called.
2004-01-29 1:27 am
Anonymous
I rode the oposite, that they were not able to integrate that code, and that they are doing it for java 1.6 . Really, i’m not sure that this news is true
2004-01-29 2:03 am
Anonymous
I read the opposite too on java.net a while back in a long heated discussion regarding why Sun has left out this feature in favor of what the thread called “syntactic sugar” (i.e. generics/templates, autoboxing, etc.). I am not so sure this is true. However, Apple did submit their VM sharing design to Sun for them to possibly adopt. I guess we’ll have to wait and see. I have 1 GB of RAM on my laptop so I don’t sweat the memory problem that much, but the shorter startup time will be nice. But shit, I have that on OS X already. All I’m waiting for is the OpenGL hardware accelerated 2D API for Java! I think this will be a major step forward for Java clients in the near future.
2004-01-29 2:12 am
Anonymous
I just read this from java.net:
An astute TheServerSide reader has noticed that a much requested feature/bug-fix of Java, having to load the entire JRE for each invocation of the VM, will be included in version 1.5 (Sun – login required). “The footprint cost of new JVM instances has been reduced in two ways. First, a portion of the shared archive, currently between five and six megabytes, is mapped read-only and therefore shared among multiple JVM processes. Previously this data was replicated in each JVM instance. Second, less data is loaded out of the shared archive because the metadata for unused methods remains completely untouched as opposed to being created and processed during class loading. These savings allow more applications to be run concurrently on the same machine.”
Here is the ServerSide link:
http://theserverside.com/news/thread.jsp?thread_id=23526
2004-01-29 2:24 am
Anonymous
quoted from sun site:
The primary motivation for including CDS in the 1.5 release is the
decrease in startup time it provides. CDS produces better results for
smaller applications because it eliminates a fixed cost: that of
loading certain core classes. The smaller the application relative to
the number of core classes it uses, the larger the saved fraction of
startup time.
2004-01-29 2:25 am
Anonymous
class data sharing
2004-01-29 3:27 am
Anonymous
Now if only Java included mechanisms to cache the run-time optimizations it generates. It’s ridiculous that each time you start the JVM, it must once again optimize every routine in the Java class library that’s called.
But you can’t do that because of the way that Java dynamically inlines the code. It makes more sense from what I understand, to cache the semi-compiled version that they are doing here.
From this representation, they can then quickly generate dynamicly inlined native code. You can’t just cut and paste native code when doing inlines, it has to be processed in some way.
2004-01-29 4:16 am
Anonymous
SUN vs IBM vs Apple implementation of the JVM
http://math.nist.gov/scimark2/run.html
IBM seems to perform the best, but they are a little behind on the release
2004-01-29 6:20 am
Anonymous
“Now if only Java included mechanisms to cache the run-time optimizations it generates”
Java has had that since day one – the .class file allows for attributes. The original intent (or so I was told by one of the original team members) was that the resulting code would be stored that way.
<Company that I used to work for> had a version of their JIT that did that sort of thing internally but it never worked out right – it is not a trivial thing to do. The second you alter CLASSPATH (or simply put a newer version of a class in CLASSPATH) you potnentially have to throw everything away. It is possible to do – it is not trivial (and I don’t know one way or the other if the JIT team was ever sure that there would be a significant spoeed gain).
2004-01-29 6:21 am
Anonymous
http://portal.acm.org/citation.cfm?id=504292&dl=ACM&coll=portal
This technology will eventually make its way into the production VM, but getting this technology transferred out of the labs into the production VM is….well….frought with turf wars.
2004-01-29 12:23 pm
Anonymous
I rode the oposite, that they were not able to integrate that code, and that they are doing it for java 1.6 . Really, i’m not sure that this news is true
What I think they said was “totally shared VM”, what you have now is just one part being shared, thus, one can assume that a new VM is executed for each instance.
2004-01-29 2:42 pm
Anonymous
Took them a bit to get that that is what the community wanted but now we will have it. Will Java take off on the dekstop? Maybe not, but they *should* be more snappy.
2004-01-29 3:09 pm
Anonymous
It seems to me that they are coming out with the “quick and dirty” solution in order to please everyone and get people to stop complaining about that feature. I thought they would implement “real” VM sharing, I mean, only one instance of the JVM which is started at boot time or somthing like that thus allowing not only the sharing and one-time-loading of the core classes, but all the classes in general.
For example, if I start 3 instances of a program which uses 20 classes of its own, these classes are only loaded once, and for the subsecuent calls, all the program needs is to create instances of the classes already known by the VM.
At least that is my idea of the concept of a VM. The risk of loosing stability can be reduced with time. But I guess we will have to wait for a real Java OS to see that.
2004-01-29 3:22 pm
Anonymous
It seems to me that they are coming out with the “quick and dirty” solution in order to please everyone and get people to stop complaining about that feature. I thought they would implement “real” VM sharing, I mean, only one instance of the JVM which is started at boot time or somthing like that thus allowing not only the sharing and one-time-loading of the core classes, but all the classes in general.
May sound nice in theory but the fact is, there is a massive risk to stability. As long as the VM can reuse native code, then 1/2 the problem has been solved. What there should be is for this native code to be cached so that it can be used later, even after the VM has been exited.
The problem with relying on one VM will result in one point of failure and I’m sorry, we’ve all heard the “it’ll get stable soon”, which goes hand in hand with the “graphics drivers on Windows will become more stable over time”, well, sorry, it never happened, the only thing that did happen was more features to video cards making drivers more complex and more likely to have bugs. Same situation will happen to VMs. They’ll get more complex and as a result bugs will be more likely than before.
At the end of the day, in 2 years time, 512MB will be standard. IMHO, whats a meg or two here and there?
2004-01-29 4:30 pm
Anonymous
I must agree with you in some points, but the problem is no that of having more memory, I myself have 1 GB of RAM in my laptop, and really don’t have any problems with java in that mather. The problem is that of concept.
What do you consider more correct? a VM running multiple programs or bunch of VMs each with a program, no mather if those programs are instances of the same?
If the whole JVM project would have had this idea of one only VM from the begining, nobody would be making a concern from the point of stability right now, everybody would be inestaed talking about the possibility of improving that same stabilty andd tunning it. I think the threat of .NET would make sure of that.
But a guess you made your point.
2004-01-29 4:35 pm
Anonymous
> At the end of the day, in 2 years time, 512MB will be
> standard. IMHO, whats a meg or two here and there?
>
Are you familiar with the concept of a second level cache? It makes a big difference wether the code of your programs fits into 2nd level cache or not. With hyperthreading this problem only gets worse.
You might have 512MB, but it is still very slow compared to your processor.
2004-01-29 5:47 pm
Anonymous
What happens if you call a synchronized method from within itself?
2004-01-29 5:51 pm
Anonymous
What do you consider more correct? a VM running multiple programs or bunch of VMs each with a program, no mather if those programs are instances of the same?
There are different ways to look at it.
The problem with a centralized master VM is simply by the nature of knowing what classes to cache, and which to discard, and how to determine if a class is already in the cache.
In the classic edit-compile-run-rinse-repeat mode, you’ll end up with a lot of garbage in the central VM.
There’s the stability issue that was mentioned, corrupt that centralized VM, and you corrupt every application on the system that utilizes it. That Would Be Bad.
The problem with Java byte codes is simply that the system treats them as Data for the running program, and typically, data is not be shared across multiple processes.
If you run, say, Mozilla on your Linux box, it may be a large executable, but that executable is loaded only once and shared by all of the processes running it. Each process has local data, but shared code segments. This is because executable code is normally a read-only/executable page, so it can be safely shared, where as data is not.
Ideally, they can try a similar technique with Java classes, mmap-ing them into the VM as shared read-only, but there are a lot of systems in java that actually manipulate byte-code in place (for example, JBOSS does this when it loads EJBs). On top of that, you have the JIT doing its work, but that can be stored in seperate memory.
I don’t know if the modern unix systems have a “free” copy-on-write mapping procedure, where different processes can map in the same file as read/write, but where the VM automatically copies pages about to be written to. I don’t think so.
To support that, the JVM needs to that itself.
That may simply work. I don’t know if those who tweak byte-codes try to do it “in place”, or do they copy the byte-codes for a class, change them as necessary, and create a new class on the new copy. If that’s the case, then just having the initial read-only shared pages will save some time and lower overall memory usage.
Of course, the other dark side is that once the JIT is done, all of the “important” classes are compiled locally for the running instance, and those are all unique across the system, and probably not shared. So, it may shorten initial load times, but long term may not net any real memory savings.
A most complicated conumdrum.
2004-01-29 7:23 pm
Anonymous
Ok, ok, ok. Stop BS-ing and PLEASE READ the FOLLOWING LINK:
http://portal.acm.org/citation.cfm?id=504292&dl=ACM&coll=portal
-or-
http://research.sun.com/projects/barcelona/papers/oopsla01.pdf
This is work done by Sun Labs, that has been slowly (ever so slowly) beginning to make waves in JavaSoft (the internal division that maintains the production VM).
This has been implemented and tested, and is working, and there are measurements given in the paper! It has a shot of making it into Java 1.6 (last I heard). The technical problems have been solved–if the political battle can be won, the truly shared VM will go into production.
The only way the community can help is to put pressure on Sun to move MVM into production.
2004-01-29 7:50 pm
Anonymous
I just read the paper, now that, is much more like the idea I had about a real VM. I guess the comunity should start pressing SUN about this project. As I see it, SUN programmers are working ala Microsoft in some way: deliver something that works (even at the cost of correctness) and keep working in an alternative really correct. I know it is not Microsoft´s idea, but is a good deal between market and well done design.
2004-01-30 6:20 am
Anonymous
well, i know that a synchronized method is essentially saying
synchronize(this)
{ //code here }
i would assume it would compile, i have no idea what it would do at runtime though, i would be interested in the results
2004-01-30 3:36 pm
Anonymous
Uh.. this is waaay off-topic, but since there is no other activity here, so here we go…
synchronized (myMutex) {
// code
}
This means that the thread trying to execute the code block must first get a lock on myMutex. Only one thread can have a lock on one object at a time, so it will have to wait here if another thread curently has the lock on myMutex. When a thread exits the synchronized block where it aquired the lock it releases it, so that other threads can aquire it.
So, consider a method like this:
synchronized int fib(int num) {
return num<=1 ? 1 : (num + fib(num-1));
}
When a thread executes fib(2) it first has to aquire the lock on the object containing the fib(int) method. When it has aquired this lock it then recursively executes fib(1) and this time it already has the lock on the object so it just goes ahead and executes it, which returns 1, which makes the fib(2) call return 3, and since it now exited the block where it aquired the lock it releases it.
2004-02-06 12:45 pm
Anonymous
> return num<=1 ? 1 : (num + fib(num-1));
Oops.. this should of course be:
return num<=1 ? num : (fib(num-1) + fib(num-2));
Not that it matters in this example, though…