A paper published on February at Sun’s site, by Greg Wright, Matthew L. Seidl and Mario Wolczko:
An Object-aware memory architecture. Quoting from the abstract:Despite its dominance, object-oriented computation has received scant attention from the architecture community. We propose a novel memory architecture that supports objects and garbage collection (GC). Our architecture is co-designed with a Java Virtual Machine to improve the functionality and efficiency of heap memory management. The architecture is based on an address space for objects accessed using object IDs mapped by a translator to physical addresses. To support this, the system includes object-addressed caches, a hardware GC barrier to allow in-cache GC of objects, and an exposed cache structure cooperatively managed by the JVM. These extend a conventional architecture, without compromising compatibility or performance for legacy binaries.
I haven’t read the PDF yet, but I wonder how hard it would be to make a general purpose library out of this in C/C++?
No way anyone thought of this kind of architecture before. Not even Symbolics LISP machine.
Actually they were pretty cool, since they changed the entire arch to support LISP. Their word-size, etc was designed to maximize lisp perf. Same kind of arch would probably be able to support Java. Too bad they died over 15 years ago
The reason I don’t think it’s come any sooner is because there is still a lot of OS hacking going on in C, which puts folks in a certain mentality, there really needs to be more system level of support for run times, and handling object oriented runtimes is smart.
“I haven’t read the PDF yet, but I wonder how hard it would be to make a general purpose library out of this in C/C++?”
Make a C++ library out of a hardware change?
I think it would be interesting if memory protection was handled by the private/public scoping rules of the object system. Per object threading would be an iteresting feature too. Processes could then just be a object with a thread of execution attatched to it.
quote: To support this, the system includes object-addressed caches, a hardware GC barrier to allow in-cache GC of objects…
So, am I going to have to buy a PCI card to have good garbage collection now???
This cannot work in C/C++/anything where you can get direct access to memory (Java generally doesn’t allow such things). Why? Because this particular hardware GC mechanism relocates objects arbitrarily in both real memory, as well as application-specific virtual memory (not to be confused with system-level virtual memory, as referenced on OSNews in the last couple weeks), where each application has its own protected virtual address space.
When one can relocate arbitrary memory regions, one can arbitrarily do away with both external (heap) and internal (address space) memory fragmentation. Essentially using a copy/clear semantic. By using object ids instead of memory locations, they allow themselves a copy/clear semantic, at the cost of object id indirection (which can be nontrivial).
Novel solution? No, the solution has been known for decades. Novel application? Sure. Using it in Java? Only useful if you care about Java, and expect this kind of thing to make its way into the JVM where you can use it.
When João Paredes first talked about an hardware oriented cpu-architecture, a lot of people said he was fool. And now this comes out… looks like I’ve seen this before…
This shows why Sun is going down the drain!
Too much Java too, little thinking of how to layer facilities on top of their OS to be used by legacy tools like C or C++ where the real money is. They don’t even ship the full spectrum of scripting languages! I’ve seen this sort of disease in other Sun projects before… too much focus on Java is going to kill them.
If only they took some clues from Plan 9 on how to build a super unix cluster!
The really cool thing about this sort of memory architecture is that it greatly simplifies the memory model and the OS’s VM. Sharing becomes simple and fine-grained (per-object rather than per-page). You can also get rid of all sorts of artificial bounderies (eg: user/kernel), because you don’t need them to protect important data.
I don’t like the Java angle, though. Java is an uninteresting OS from my point of view. I’d rather see a simple mechanism that could be used by a wide range of languages. I’d also like to see OS support for this sort of thing. It is non-optimal to have each runtime have it’s own GC, and frankly GCs don’t interact to well with VMs that are unaware of their existance.
Well I come from a different perspective since I am simulating a cpu model with just this sort of Inverted Page memory manager with support for memory objects down to 32bytes as well as processes (or threads if you must), scheduling, message passing etc all in the HW.
Is any of this new?, well all the above minus this HW memory management was done by Transputers. The IPM scheme has been used for along time by IBM for the RS6000 and by newer RISC machines and I’m sure by much older cpus, but I suspect for pages of 4k or so, I need to research that.
And most every SW text book since Knuth has flogged hashing to death too. One thing the SW guys do though is to wanna use primes and modulo all the time, HW is better off using 2^n and plain xors so the implementations differ a little.
Guess I will have an enjoyable read later to compare notes.
I will give a hint of what is possible when taken to its logical conclusion.
Start of with xp2400 with maybe 2Mbyte of cache and 1GByte of DDR. Details not that important. Write a new delete memory manager that uses this hash scheme in plain C with a heap of some fixed size say 32KBytes, and then doubling until 512MBytes.
For each point half fill it with new objects and delete them out of order. Ofcourse linear addressing through each Handle will spread most accesses all over the memory, this is considered very bad by CS types, but EEs kinda like randomness and communications guys are nuts over it, the more random the better.
As long as the heaps are within the size of the cache, the randomness doesn’t matter 1 iota but then heaps of that size aren’t very usefull to anybody. Once the heaps cross 2MByte the cache gets thrashed and overall performance drops to about 4x slower for the 512MByte heap. I think though that if the referenced objects are used to some degree, they spread out and slow down the random walking cache misses so the xp2400 will recover much of that lost performance given its design.
Now look at a very simple cpu design that runs 10x slower than xp2400 (with 1% of xp HW), for most ordinary instructions. But it also uses the same hashed package implemented in HW. Every new/del and adj (realloc) takes 1 cycle per 32bytes allocated or so. There are atleast 2 penalties, 1 the heap must be left somewhere near 50% empty for best performance, and the handles might collide every so often.
The simulations tell me that the xp is 12x slower than this slow cpu for tiny heaps, and 50x slower for usefull heaps rather than 10x faster. That because the SW version must be passing through a 100 odd instruction per hash point when the HW can do it in 1 cycle.
Thats enough to get me excited too, and it allows for pretty darn large address spaces or 2^w by 2^n where w & n would likely both be 32 so a 64b address space. HW can allow w to be smaller though. The w part comes from the no of unique but random handles the HW could generate and possibly reuse, these come from a pseudo random no generator. The handles must be random to spread the linear addresses out since most objects seem to count from 0 & up.
There are some downsides, what if you new(), a permanent handle then years/months/days/hours later another handle is created (meet the new boss) same as the old boss (the who) even though it takes 2^w trips to new(). Nasty, some SW help needed.
It gets better though if the cpu is threaded and the memory is also multiway banked with all banks allowed to start an access every 20ns provided its not in use (and thats why its important for the physical addresses to be as random as possible at least in the bank no) then effectively the cpu is running with DRAM that is acting much more like SRAM. In the future I’m told that banking may go much higher, it actually helps the DRAM circuit guys out too, this means the banks collide less often and a lower percentage of the memory is “hot”. Imagin a 1G DRAM with 64K banks, each bank is stil 16Kbits, about the same as DRAMs were when Micron & Inmos went into business. That 64K bank overlay looks like a 64k SRAM to all intents with deep RAM behind it.
Now if you follow the Cell and RAMBUS stories you can see that IO pin cycles are going up (DDR, QDDR etc) that just mean ever faster command issue rates, and a slowly falling DRAM latency going down to 15ns.
What does that mean, only threaded cpus make any sense at all with such memories, it also means that data caching can be removed if the thread count can cover the latency.
Sorry for length, I bet some of its in the paper
Back to reading
“object IDs mapped by a translator to physical addresses.”
you know, this can be easily implemented on 286+ using segmentation. its nothing new.
object ID -> segment
translator -> segment base & limit
and that’s it.
its just a new way of calling segmented addressing.
<sarcasm>wow! what are they going to invent next???</sarcasm>
You obviously did not read the article closely enough…
Okay I finally got through it, not sure if I’m really any the wiser though. I prematurely recognized the 1st couple of memory mapping schematics as an inverted page mapping scheme seeing the xor hash stuff but later find out its way more complex and only for the cache. The OID I would call a handle. They seem convinced that most objects in Java are small, perhaps they are, I would have expected a far wider spectrum from stringies to large bitmaps etc.
Take an already complex muliprocessor system (any will do) with cache coherency stuff thats not particularly easy to describe and not Java specific, and then cram in some neat ideas that may well help out Java GC issue. If it weren’t for Java-GC I don’t suppose this would be here.
Rayner had you read the paper when you said
“The really cool thing about this sort of memory architecture ”
because I was hoping the same thing too, its only the cache AFAIKT but they do mention to extend to memory later on.
The scheme I outline really does allow for the entire memory which might be >>4G to be composed of small & large objects and managed with less fuss than conventional paging schemes.
It really does seem today that Occam razor does not apply anymore, make things as complex as possible and then some more:)
Now for some pain killers
They do only cover cache architecture, but since programs see main memory through the cache, the “memory model” (from the perspective of software) is whatever model the cache uses. No matter what mechanism the memory uses, code will still use “object handles” instead of pointers. When they say they’ll discuss memory later, it seems they mean not that they’ll discuss how programs deal with memory later, but how the cache interacts with memory.
It really does seem today that Occam razor does not apply anymore, make things as complex as possible and then some more:)
I prefer to Occam’s razor a quote by Einstein:
“Make things as simple as possible — but no simpler”.
Maybe for embedded CPUs where the x86 does not rule them all, but on the PC/server, there is little chance of having this kind of HW modification be applied..
Heck, I remember reading about such kind of proposal to add bits distinguish integers from pointer a long time ago, without any result in the real word..
From the article:
Object-oriented programming is the dominant
software development paradigm, and has been so
for the last decade. Object-oriented programming
languages, such as Java™ and C#, have converged
on a common object model whose roots can be
found in Smalltalk [9]..
Thankfully, dynamic languages like ruby and python haven’t converged on the broken ideas in the object models of Java and C#. Static methods for example, yuck!
The big problem with the design of Java was the guy who designed it, James Gosling, had obviously never used Smalltalk. Java is more like a simplified tidied up C++, with better runtime introspection and garbage collection.
C++ certainly never had its roots in Smalltalk, and was influenced by Simula most of all. Smalltalk was also influenced and inspired by Simula. So the common ancestor is Simula, but by far the most innovative branch is the Smalltalk/Ruby dynamic message passing oop side.
The article sounds much the same as LOOM, Large Object Oriented Memory for Smalltalk-80 systems proposed over two decades ago by Ted Kaehler and Glenn Krasner. The book Smalltalk-80, Bit of History, Words of Advice has a good description – I’m just reading it after reading the Sun paper, and am getting a distinct feeling of ‘deja vu’
the main advantage i see, the reason object-aware memory mapping will one day rule the earth, is because of one thing: concurrency. sharing access is fine grained, security constraints and runtime parameers can be per objects
way back in Operating Systems class I was drumming around notes of a inverted page table object-memory system; at the time inverted page tables just made a perposterous amount of sense. the article has somewhat muddied my previous clarity… hash tables?…
good stuff.