Version 1.3 of the LLVM Compiler Infrastructure is now available. LLVM is a open-source system that focuses on providing compiler components to build all kinds of compiler-related programs, including traditional static compilers, JIT compilers, profile systems, debuggers, script engines, etc. Release 1.3 provides many new features, including new beta support for PowerPC code generation and improvements to the C/C++ compiler’s optimizer.
I happen to love llvm. I was disappointed from mono non building over it (tough reading the stuff from mr. molaro at LtU I somewhat understand why).
I wish we can see more languages ported or built over it.
I’m not that familiar with llvm, can you maybe elaborate on your mono comments?
I guess reading this thread will be better than any tnatative explanation fro me
See:
http://lambda-the-ultimate.org/node/view/141#comment-916
C– at Harvard EECS
http://www.cminusminus.org/
Is a set of optimizing front and back ends for many langs. Where the front end (lang) can send ‘hints’ to the back end (code generator) for optimizations that are not tied to a predefined lang (C,C++,ML,…)
Other than the ‘readiness for prime time’ what are the differences between these two projects?
is there a full up to date backend for C–?
The projects are quite different in scope. C– is supposed to be a portable code generator, while LLVM is a general-purpose compiler infrastructure.
1) LLVM is designed to encompass more of the compiler than C–. It has extensive support for high-level optimization and analysis, while C– doesn’t need such things, because it assumes that the language-specific compiler has it’s own infrastructure for this. Optimization and analysis is really the focus of LLVM (and it does that extremely well), while portable code generation is the focus of C–.
2) C– seems a little better thought-out for languages that don’t fit the C++/Java mold. For example, LLVM requires all variables to be statically-typed, so polymorphism can only be implemented with casting. Theoretically, anyway, this loses some analysis opportunities for dynamically-typed languages like Lisp. C– guarantees general tail call elimination, while LLVM only guarantees “best effort.” In practice, “best effort” might be just as good, but it’s not sufficient for languages like Scheme, which have formal requirements for the optimization in its standard. The GC API in LLVM is relatively recent, while the one in C– is stable. The LLVM instruction set uses a C-like type system, so it has no way to directly represent stuff like tagged values*. It can be done using casts, but again, that theoretically decreases the precision of the analysis. LLVM also has no direct way to represent continuations, which C– does, and in general, C– gives you quite a bit more control over details.
*) A tagged value is a way of speeding up dynamic typing for certain important types. It works by stealing some bits from integers and pointers (2 bits is common), and using them as type tags for certain important types (integer, pointer, character, etc). So if an object is one of these types, it can be represented as a single machine word, instead of requiring another word of type information.
As Rayiner pointed out, there are many differences between C– and LLVM, and many reasons for them. Here are some more:
1. LLVM provides a far richer set of components to build on, including JIT compiler support, optimization support, C/C++ support, etc. The primary effect of this is that using LLVM means that a language designer does not have to reinvent the wheel as much as if they used C–.
2. I think it’s fair to say (please correct me if you disagree Rayiner), that LLVM is a more active and more quickly moving project than C–. You can gauge our progress by the contents of the “status updates” accessible from the left bar of the main page.
3. LLVM is much more stable than C–. In particular, it seems to me that C– is still in the “oh yeah, we didn’t document that” and “oh you can’t do things as documented yet” phase. LLVM generally works, and (for example) currently C/C++ front-ends that are competitive with GCC (C performance is slightly behind GCC, C++ performance is much better than GCC).
4. LLVM is extremely well documented (see http://llvm.org/docs/), and the documentation is up-to-date. LLVM does not have several (slightly and substantially) different implementations that do not work the same.
5. LLVM is not as well developed for functional programming languages as C– is. The only potential problem area that I am away of is the one Rayiner pointed out: there is no explicit tail call annotation that you can put on a call instruction to guarantee tail-call-ness (yet). LLVM development is driven by the people who are working on it, and noone has seemed interested to implement this yet. There is nothing fundamental at all preventing its implementation. In contrast, C– provides very little assistance for common activities like laying out structures and other front-end things.
6. Given a target-independent source language (e.g. Java), a properly written LLVM front-end will generate a target independent LLVM bytecode file. I do not believe that the C– code produced by a hypothetical C– Java front-end would be target-independent.
7. The LLVM code generator produces better code.
8. LLVM has many new features coming on the horizon, including a toolkit that makes it even easier to build language front-ends, vector (SIMD) support, more loop optimizations, better code generator support, etc.
Rayiner wrote: “C– seems a little better thought-out for languages that don’t fit the C++/Java mold.”
I don’t see this at all. Yes, an LLVM Java front-end is in development, but we also have a front-end for scheme, stacker (a language similar to Forth), and an MSIL front-end was written. As limitations are run into by new languages, LLVM is extended (this is happeneing less and less frequently now). LLVM is not limited to static languages at all, and in fact, many people have talked about writing Python, Ruby, Parrot, etc front-ends for LLVM.
In any case, LLVM and C– are not in direct competition, they are strictly alternative solutions to make it easier to build a compiler.
-Chris
Oh one other thing:
> The LLVM instruction set uses a C-like type system, so it
> has no way to directly represent stuff like tagged values*
This is incorrect, the LLVM scheme front-end uses tagged values just fine!
-Chris
LLVM is not limited to static languages at all, and in fact, many people have talked about writing Python, Ruby, Parrot, etc front-ends for LLVM.
It’s not so much a matter of being limited to static languages as not being optimal for dynamic languages. High-performance functional language compilers have techniques that don’t really fit the LLVM model. A number of examples:
1) Compilers that use CPS conversion tend to generate code that consists of large blobs connected by gotos. LLVM could probably be made to generate such code, but such unusual usage could cause problems. I doubt many LLVM test cases look like that.
2) Many compilers don’t use a stack. Instead, they allocate activation records on the heap and garbage-collect them. I don’t see an obvious way to do this without refraining from the use of ‘call’ (replace with your own function-call semantics with ‘br’).
3) Lisp compilers support returning multiple values. You can get the same effect with a hidden pointer parameter, but Lisp include multiple values because compilers can do certain optimizations when multiple value returns are represented directly.
4) Lisp-like languages have much more powerful exception mechanisms, which include restartable exceptions. You could implement those in LLVM using other mechanisms, but you’d have to avoid using LLVM’s native EH mechanism. The resulting implementation probably wouldn’t be optimal (you’d lose the ability to use certain techniques). After all, if EH could be implemented optimally (in the C++/Java case) on top of the regular LLVM functionality, why bother putting EH instructions into LLVM at all?
With regard to type tags — I don’t see an obvious way to implement them without casts. The bitwise logical operations don’t work on pointers, only integral types, so you’d have to cast back and forth. In an architecture that depends (according to the papers, anyway), on type-safety to enable optimization, casts make me nervous. Particularly because access to tag bits in a pointer *is* a type-safe operation, only the LLVM type system has no way of describing a type which is a union of pointer 2-bit aligned pointer and a 2-bit integer.
Of course, I forgot a much more mundane criticism: there is no way to get at the processor’s overflow bits, so implementing infinite-precision integers is slower than it needs to be.
Of course, at the end of the day, an LLVM-targetted dynamic language will probably be better than the majority of Lisp/Scheme/Smalltalk implementations. However, the LLVM backend might just be restrictive enough that it becomes a problem for those trying to write the next Allegro or CMUCL.
Okay, as you probably know, this isn’t the right forum for these discussions. That said, here are some responses:
> 1) RE: gotos I doubt many LLVM test cases look like that.
I don’t think LLVM would have any problems with this at all. We lower all high-level control flow to gotos already anyway.
> 2) Many compilers don’t use a stack. Instead, they
> allocate activation records on the heap and
> garbage-collect them.
Given “correct” tail calls (something we will get, but don’t have yet), I don’t see this as being very hard to implement at all. If you would like to discuss the details, the llvmdev list is the appropriate place for the conversation.
> 3) Lisp compilers support returning multiple values.
This can be implemented in LLVM today, though the implementation isn’t wonderful. There are near term plans to add multiple return values to LLVM functions. This is also required for supporting inline assembly (which can define multiple register values).
> 4) Lisp-like languages have much more powerful exception
> mechanisms, which include restartable exceptions.
There should be no problem implementing this in LLVM, even today.
> With regard to type tags — I don’t see an obvious way
> to implement them without casts.
It is implemented with casts. The whole point of LLVM is to cut out as much redundant cruft from the language as possible. Since there is already a way to do this, there isn’t a need to add special support for just one class of source-languages.
> there is no way to get at the processor’s overflow bits,
> so implementing infinite-precision integers is slower
> than it needs to be.
LLVM will eventually be extended to do this, when someone has the interest or desire to implement it.
If you’d like to have further conversations about this, I think that the llvmdev list (http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev) would be a much more appropriate place to have it.
To summarize, C– is certainly better than LLVM for some applications. If you need garbage collected stack frames (for example), and none of the other stuff in LLVM would be useful to you, using C– makes more sense than using LLVM. On the other hand, if extending LLVM to support what you need would take less time than implementing all of the stuff that LLVM provides but C– doesn’t, it would make sense to use LLVM. We have tried to make LLVM as easy to use and as extensible as possible, and if you come back in 3 months, you might magically find that we’ve implemented the feature you wanted.
-Chris