Using .NET generics for fast numerical calculations

Guest post by Rüdiger Klaehn 2004-10-06 .NET 30 Comments

The current implementation of generics in .NET 2.0 does a very good job to make typed collections faster and more easy to use. But as many people have noticed, it leaves much to be desired when you want to do calculations with generic types.

The Problem

In the last article about this topic I used a Point<T> structure to illustrate this problem. But of course the problem exists in many other situations as well. For example you might want to write a method that calculates the sum of all elements of a List<T> where T is a type like int, double, decimal or any other type that can be summed.

For someone used to C++ templates this looks like a trivial problem. He would probably come up with something like this:

public class Lists {
    ...
    public static T Sum(List<T> list) 
    {
        T sum=0;
        for(int i=0;i<list.Count;i++)
            sum+=list[i];
        return sum;
    }
    ...
}

This method could then be used to calculate sums:

List<int> list1;
List<float> list2;
List<decimal> list3;
...
int sum1=Lists.Sum(list1);
float sum2=Lists.Sum(list2);
decimal sum3=Lists.Sum(list3);

But surprisingly enough this is not possible with .NET generics. The problem is that in .NET type parameters without constraints are assumed to be of type System.Object, which is not very useful to do calculations. There is currently no way to constrain type parameters in such a way as to require the existence of a static method. And since operators in .NET are static methods it is not possible to have operator constraints.

A clean way to enable numerical computations would be to let the basic data types like int, float, double, decimal etc. implement an interface for arithmetic operations. Then this interface could be used to constrain the type parameters. This would work similar to the IComparable<T> interface that all basic data types implement.

Unfortunately the people at microsoft seem to be unable to do this in time for .NET 2.0, which will probably be released in 2005.

Many people have been thinking about this problem, among them Anders Hejlsberg and Eric Gunnerson. There are many creative solutions, such as this, which involves dynamic code generation.

As an example, lets take a look at the solution suggested by Anders Hejlsberg. I copied the code from Eric Gunnersons blog.

First define an abstract base class Calculator<T> for the operations to be performed:
```
public abstract class Calculator<T>
{
    public abstract T Add(T a, T b); 
} 
```

Then specialize for the types we want to use:

namespace Int32
{
    public class Calculator: Calculator
    {
        public override int Add(int a, int b)
        {
            return a + b;
        }
    } 
}

Then use an appropriate Calculator for the type you want to use

class AlgorithmLibrary<T> where T: new() 
{
    Calculator<T> calculator;
 
    public AlgorithmLibrary(Calculator<T> calculator)
    {
        this.calculator = calculator;
    } 

    public T Sum(List<T> items)
    {
        T sum = new T(); 


        for (int i = 0; i < items.Count; i++)
        {
            sum = calculator.Add(sum, items[i]);
        } 
        return sum;
    }
}

All those implementations solve the problem, but unfortunately they all involve some kind of dynamic method invocation (virtual methods, interfaces or even delegates), which makes them quite slow. A virtual method adding two integers will spend only a small fraction of the time doing actual work. Most time will be used for the virtual method call itself. For most numerical applications, the overhead associated with virtual method calls and such is simply unacceptable.

The Solution

Fortunately there is a solution to this problem. When a type parameter is a valuetype, calls to its methods do not involve any kind of dynamic method invocation even if the type parameter is constrained by interface constraints. You can take advantage of this with e.g. the IComparable<T> interface. The call to CompareTo in the following code snipplet does not involve any cast or dynamic method invocation as long as T is a struct.

public static class Utils
{
    public static void Swap<T>(ref T a,ref T b)
    {
        T t=a;a=b;b=t;
    }
    public static void Sort<T>(ref T a,ref T b) 
        where T:IComparable<T>
    {
        if(a.CompareTo(b)>0)
            Swap(ref a,ref b);
    }
}

So an approach to avoid dynamic method invocation would be to use a valuetype for the type containing the operations. That way, calls to the operator methods will be nonvirtual. Since the operator methods like T Add(T a,T b) are usually very short, they will be inlined even by the current less than optimal JIT compiler. For primitive types, the runtime cost of the abstraction is zero.

The first person to come up with this idea was Jeroen Frijters.

Here is a short example for this approach:

interface ICalculator<T>
{
    T Sum(T a,T b);
}

struct IntCalculator : ICalculator<int> 
{
    public int Add(int a,int b) { return a+b; }
}

struct FloatAdder ...
struct DoubleAdder ...

class Lists<T,C>
    where T:new() 
    where C:ICalculator<T>,new();
{
    private static C calculator=new C();
    public static T Sum(List<T> list)
    {
        T sum=new T();
        for(int i=0;i<list.Count;i++)
            sum=calculator.Add(sum,list[i]);
        return sum;
    }
}

The use of this class is a bit awkward since you now need to provide two type parameters. But you can alleviate this by using aliases such as using IntLists=Lists<int,IntAdder>;

List<int> list1;
List<float> list2;
List<decimal> list3;
...
int sum1=Lists<int,IntAdder>.Sum(list1);
float sum2=Lists<float,FloatAdder>.Sum(list2);
decimal sum3=Lists<decimal,DecimalAdder>.Sum(list3);

As you can see, the generic Lists type now has two type parameters, one for the storage type and one for the methods that do the calculations. This is of course not as short as it would be if the primitive types would implement an interface like IArithmetic<T>. But it works, and the performance is similar to the non-generic version.

Performance

To test the performance of this approach, I wrote a small benchmark program that compares summing up the elements of a large list with both generic and non-generic versions. The performance of the generic version is identical to the performance of the non-generic version, so apparently the call to the calculator.Add method is inlined.

Please make sure that you run the performance test in release mode, since some very important optimizations such as method inlining are disabled in debug mode.

Other Benefits

Since this approach decouples the storage type (T) and the type that is used to perform calculations on T (C), you can do some other neat things. For example you could have a CheckedIntCalculator that performs overflow checks when performing arithmetic operations:

struct CheckedIntCalculator : ICalculator<T> {
    public T Add(T a,T b) { checked { return a+b; } }
}

Or you could have a DoubleComparer that has a different definition of equals than the default comparison operations. Often you want to define two floating point numbers as equal if the difference is lower than some predefined threshold to avoid fake inequalities due to rounding errors.

struct FuzzyDoubleComparer : IComparer<T> {
    public bool Equals(double a,double b)
    {
        return Math.Abs(a-b)<1e-10;
    }
    ...
}

Syntax candy

One small issue with this approach is that you still can not use operator overloading on type parameters. Instead of simply writing sum+=list[i]; in the above example, we have to call the Add method of the calculator struct like this: sum=calculator.Add(sum,list[i]);. This is usually not a big deal, but it is a problem that can be solved by using a wrapper struct:

public struct Number<T,C>
    where C:ICalculator<T<,new()
{
    private T value;
    private static C calculator=new C();
    private Number(T value) {
        this.value=value;
    }
    public static implicit operator Number<T,C>(T a) 
    {
        return new Number<T,C>(a);
    }
    public static implicit operator T(Number<T,C> a) 
    {
        return a.value;
    }
    public static Number<T,C> operator + (Number<T,C> a, Number<T,C> b)
    {
        return calculator.Add(a.value,b.value);
    }
    ...other operators...
}

By using this wrapper struct, we can rewrite the Lists<T,C> class to use operators again:

class Lists<T,C>
    where T:new() 
    where C:IAdder<T>,new();
{
    public static T Sum(List<T> list)
    {
        Number<T,C> sum=new T();
        for(int i=0;i<list.Count;i++)
            sum+=list[i];
        return sum;
    }
}

Unfortunately, due to some very serious limitations in the current JIT compiler inlining heuristics the version using the wrapper struct will be slower than the version that calls calculator.Add directly. But since the wrapper struct does not add any dynamic method invocation the calls to wrapper methods could easily be optimized away by a more capable JIT compiler.

Conclusion

Using valuetypes as type parameters, it is possible to write efficient classes that do arithmetic operations on type parameters. The code is not as short as it would be in C++, but that is acceptable considering the large benefits of constrained type parameters. The thing to keep in mind is that while casting a valuetype to an interface has a significant overhead due to boxing and dynamic method invocation, constraining a type parameter by an interface has no such overhead.

References

Benchmark of generic vs. nongeneric summation of list elements
Interfaces for all arithmetic calculations
Implementation of the above interfaces for the primitive types
Wrapper classes for the interfaces
A sample program that uses the above interfaces to calculate the standard deviation of a List<T>
Another creative way to use zero-size structs
Please vote on this suggestion

About the Author
Rüdiger Klaehn works as freelance programmer. He is currently trying to start a company with some friends, but in germany that is a long and tedious process. He is very interested in functional programming languages and hopes that .NET will lead to a more widespread adoption in the industry.

If you would like to see your thoughts or experiences with technology published, please consider writing an article for OSNews.

30 Comments

2004-10-06 9:24 pm
Anonymous
I’m primarily a scientific programmer working with Fortran 95, so this article peaked my interest. But, I think I’m gonna need a good intro to .NET before tackling this. Any Fortran 95+.NET’ers out there who can help me translate what he wants to do into Fortran-speak?
2004-10-06 10:24 pm
Anonymous
I’m a scientific programmer too, but can anything beat the elegance of C/F90 ?
2004-10-06 10:52 pm
Anonymous
Just stick with C++ and forget about problems like this. If you absolutely have to use this .NET framework I’d suggest going with C++/CLI which still allows you to use its almighty templates.
2004-10-07 12:25 am
Anonymous
I always get a kick out of issues, like generics. I’m sure there are a few places where really tight loops are necessary, but just how many places who implement generics really need it? Most computers burn away their time just sitting there doing nothing 99% of their lives. So, let’s say that you really have a process that takes a few hours each time to run and getting that 30% boost can make a difference. Does the extra five minutes it takes to write a specific data-typed version justify the time it took to figure out how to write a generics version that could be used for just a few elemental data types? There is always a trade between specific code (more understanble, but less maintainable) and generic code (more flexible, but slower and more difficult to understand). Generics just seem like a good way to get both less maintainable and more difficult to understand, both of which many “lowly” programmers have difficulties.
My two cents,
Bel
P.S. Good article, though. Personally, if I really felt the need, we’re going to be going to be more complex anways, so emit the byte code and get all your “money’s worth” from the complexity and hide it in a black box for end users.
2004-10-07 12:38 am
Anonymous
Gaming, dude. C# can be used for serious gaming projects – and it is. In the Axiom Engine, its about a 40FPS boost just by implementing generics… probably 20 or 30 FPS more by tweaking the implementation.
2004-10-07 12:39 am
Anonymous
I little bit off topic … but when (meaning rough date) can I expect to see XAML support? Also, will XAML support be in .NET version 2?
Thanks
2004-10-07 12:56 am
Anonymous
.NET 2.0? Why? I can do this now in Java 5.0 (1.5)…
2004-10-07 1:05 am
Anonymous
Yeah you can, but don’t you get no performance gain from Generics in Java?
2004-10-07 1:08 am
Anonymous
there is no need for the additional gumpf that comes with .net and even such data abstractions and colelctions.
this strong statement is supported by the fact that both C and Fortran are still used and thriving in the numerical computation fields.
the poster who suggests that computers do nothing 99% of the time is missing the point. computers are worked hard for those short periods when you need anwers. that is the region of interest. and if you can compyte 5% faster then yu may churn through 150,000 more anti-cancer drug combinations in a month. that’s not to be laughed at.
myself i am glad that the scientific community have always been unfazed by geeky and glamourous trends in computing. to them/us, a computer is a number cruncher with storage and languages are not religous but differen t balances in the space of domain specificity and abstraction. nothing more.
2004-10-07 5:13 am
Anonymous
Java has REPEATEDLY been shown to be faster than native code in several numerical transforms. This is due to the JVM’s ability to dynamically optimize the code, where as C++ gets one shot to optimize and has no idea what’s actually been going on at runtime.
That said this just shows the whole fallacy of C#’s approach to generics. Java has done a much better job with them in Tiger if you ask me. While it’s nice generics work with reflection in .NET, the performance toll is too large to be ignored.
2004-10-07 5:34 am
Anonymous
Hmm, I was under the impression that Java Generics sucked, and .NET’s implementation ruled the world – can someone enlighten me? Preferably articles, not fanboyish statments.
2004-10-07 5:34 am
Anonymous
C++ templates are an ugly hack, period. And C# and other .NET languages have much more advanced features than C++, at least in the 2.0 version. Take a look at the strange things you have to do to get real closures in C++ (Boost::Lambda). In C# 2.0 they come with the language as anonymous delegates.
.NET generics is the nicest generics system I know of. C++ templates are a pure compile time feature that produces horrible code bloat. And java “generics” are a joke. They are just for better type-safety and cleaner syntax, but do nothing at all about the problem of excessive boxing/unboxing even for trivial cases.
2004-10-07 5:38 am
Anonymous
“That said this just shows the whole fallacy of C#’s approach to generics. Java has done a much better job with them in Tiger if you ask me. While it’s nice generics work with reflection in .NET, the performance toll is too large to be ignored.”
You do not know what you are talking about. In java, generics is just a feature to eliminate casting and to enhance type safety. Deep down it is all objects. So if you have a List<int> in java, it is internally represented by an object[], so you will have boxing/unboxing everytime you access or modify the list. In .NET a List<int> uses an int[] internally, so no boxing or unboxing whatsoever.
The performance of .NET generic code will be 5 to 10 times higher than the performance of java generic code. If you don’t believe me, I can post a benchmark that you can try on your own machine.
2004-10-07 6:30 am
Anonymous
Java has REPEATEDLY been shown to be faster than native code in several numerical transforms. This is due to the JVM’s ability to dynamically optimize the code, where as C++ gets one shot to optimize and has no idea what’s actually been going on at runtime.
Wrong?
http://fails.org/benchmark.html
The JVM’s main runtime optimization feature to be touted is the inlining of virtual methods. If virtual methods are being abused in performance critical C++ code I’d say it’s more an indication of poor programming than a substantial language drawback.
I work in scientific computing and we would never consider using Java. Our model is written in Fortran 90 which is, and will likely remain, the primary language utilized in scientific computing.
2004-10-07 7:34 am
Anonymous
“C++ templates are an ugly hack, period.”
We agree on disagreement. I think of C++ templates as a very powerful tool, not as a hack. I would also tend to think that implicit constraints are an advantage. If you have to use constraints you can have them. Say you want something like ICompareable. You use like this (Can’t use usual template syntax here ate OSNews. Substitute the » and « with > and <):
template « typename T »
class Subject : private Comparable «T»
{};
and implement Comparable as follows:
template « typename T »
struct Comparable {
static void constraint(T a, T b) {
(void)(a < b);
(void)(a == b);
(void)(a > b);
}
Comparable() {
void (*p)(T,T) = constraint;
}
};
Look at http://www.artima.com/cppsource/wishlist.html if you want to read more. But anyway I prefer implicit constraints anyway.
“C++ templates are a pure compile time feature that produces horrible code bloat.”
You have a common misconception here. First I think it must be quite embarrasing for you to dismiss C++ for “code bloat” when you use .NET which is quite a memory hog. In comparisson C++ is *very* effcient. Concerning your code bloat argument. Earyl compilers surely produced multiple instantiations for different argument parameterization. Modern compilers tend optimize quite a few cases away.
2004-10-07 8:02 am
Anonymous
Most computers burn away their time just sitting there doing nothing 99% of their lives. So, let’s say that you really have a process that takes a few hours each time to run and getting that 30% boost can make a difference. Does the extra five minutes it takes to write a specific data-typed version justify the time it took to figure out how to write a generics version that could be used for just a few elemental data types?
Yes, if the 30% boost means you can process every frame in a continuous data stream, instead of every other frame. Also, maybe your computer is idle 99% of its life; in scientific computing it is no exception having to buy CPU power by the minute. Those super computers are so hideously expensive that they make sure they are working all the time.
Generics just seem like a good way to get both less maintainable and more difficult to understand…
That depends on how generics are implemented. I would agree with you having read the lengths you need to go to to get decent performance out of .NET generics, but if you consider C++ templates, how is
template<typename T>
T add(const T& a, const T&b)
{
return a + b;
}
hard to read..?
2004-10-07 8:04 am
Anonymous
Aaaargh!!! I hate the HTML parsing on this forum. The preview button shows , < and > correctly but doesn’t understand the UBB tags, and when the post shows up it’s the other way around. Sigh.
2004-10-07 12:16 pm
Anonymous
“We agree on disagreement. I think of C++ templates as a very powerful tool, not as a hack.”
Of course C++ templates are powerful. But so is the C preprocessor. A feature can be both very powerful and very hackish.
“I would also tend to think that implicit constraints are an advantage. If you have to use constraints you can have them.”
I am not sure wether implicit oonstraints are better or worse than explicit constraints. On the one hand, implicit constraints are easier to use for the class library author.
But on the other hand, they make it very easy to add too many constraints, they go against the grain of a strongly typed language, and they produce very unhelpful error messages when something goes wrong.
But my main problem with C++ templates and C++ in general is that everything is done during compile time. There is almost no runtime type information available, so you have to graft strange extensions like the Qt moc on top of it all for many applications. Rtti is very limited.
In .NET, you have a large amount of runtime type information, and you can easily produce dynamic methods and such. For example the regular expression engine of .NET works by emitting MSIL code for each pattern, which then gets compiled to native code by the JIT. The result is very high performance regular expressions. Try something like this in C++.
Another thing to keep in mind is that a highly optimizing JIT will produce faster code than a static compiler, and automatic memory management with gc is often faster than manual memory management. So in the long run managed languages using an intermediate representation and a JIT will be both faster and easier to use than nonmanaged languages.
There are many bechmarks where java and .NET outperform C++ even now.
2004-10-07 2:10 pm
Anonymous
But my main problem with C++ templates and C++ in general is that everything is done during compile time.
That is an advantage. Make the compiler work for you, instead of only finding out at run time and having to throw an exception in the user’s face.
2004-10-07 2:49 pm
Anonymous
“That is an advantage. Make the compiler work for you, instead of only finding out at run time and having to throw an exception in the user’s face.”
You should take a look at the .NET generics approach before dismissing it. Type safety is enforced at compile time using type parameter constraints, so you won’t “throw any exceptions in the user’s face”.
But the specialization and instantiation of the generic types is done at runtime, which makes the whole approach much more flexible. And a good JIT will make them perform just as well as the C++ equivalent.
2004-10-07 3:27 pm
Anonymous
Generics do not allow user partial specialization, type parameter inheritance or non-type parameters (in fact *only* ref classes or val classes). They present a very limited form of generic programming (something they have in common with Java’s approach).
2004-10-07 6:13 pm
Anonymous
if you want fast calculus, why don’t you simply use a fast library?
I mean, even scripting languages have them (python’s Numarray or ruby’s NArray come to mind) no chance to have them in c# ?
Do MS really want us to do fast numerical calculi by ourself, without esternal libraries or this is just a kind of strange premature optimization for a not ever rising problem?
The artcile was nice to read anyway, thank you.
2004-10-07 7:16 pm
Anonymous
.NET generics were not meant to be some kind of compile-time mini language which can be abused to calculate factorials and such. They exist solely to make it possible to write types and methods with generic type parameters for e.g. collections. And they do this fairly well and with no noticeable performance overhead (as opposed to the java version which has horrible performance).
Besides, C++ templates are very limited in their own way: for example you can not specialize a template for a type you do not know at compile time. And AFAIK C++ templates does not support type inference at all.
There are workarounds for emulating non-type parameters in .NET generics. And you can also do quite weird stuff with them using nested type parameters. Maybe I will write another article about this topic.
2004-10-07 7:25 pm
Anonymous
I have no idea what MS wants us to do. They probably don’t know yet either, since .NET Whidbey (2.0) is not even out yet.
Of course you can use the established numerical libraries like LAPACK, BLAS etc. from .NET. I was just curious about the .NET generics implementation, its advantages and limitations.
As it turns out, generics in .NET are very fast and you can find usable workarounds for most limitations.
2004-10-07 11:04 pm
Anonymous
C++ templates are actually even Turing complete, but I agree that it is of no practical use most of the time. In reference to your other point–the “abuse” of templates–templates allow for a very flexible kind of programming style based on policies and traits which is impossible to achive in C#. But then it is as you said–C# generics were ment to provide typesafe collections and simple algorithms (half the power, half the price).
Not sure what you could mean by type inference in this context. Often you can omit template parameters which are then deduced by the compiler, but in general you have to provide them. But where does C# use type inference?
Could you explain to me what you meant with: “you can not specialize a template for a type you do not know at compile time”? Thank you!
2004-10-07 11:26 pm
Anonymous
For example if you dynamically load a type T from an assembly (shared library/dll) at runtime and want to dynamically create a List<T> to store instances this type. This is not possible in C++. But of course dynamically loading a type from a dll is impossible anyway, so that might not be a problem for the stuff C++ is usually used for.
Another thing about C++ templates is that you have to have the source code to use a template. I guess there are various proprietary precompiled header formats, but you can not have a shared library or dll containing a template library. There is no stl.dll. Instead stl implementations are distributed as hundreds of .h files containing the source code.
The result is less flexibility, no code sharing between applications and much longer compile times. If there was something like the stl for .NET you could swap the stl implementation of your program without recompiling by just replacing the stl.dll in your programs directory with a newer version.
2004-10-08 7:29 am
Anonymous
Thank you for your reply. Just wanted to let you know that there will be a STL.NET for C++/CLI in Visual C++ 2005 which should overcome the source code only limitation.
C++ is quite a static language which does not allow for everything that is possible in relatively more dynamic languages such as Java or C#. I can live with this static nature–in fact I’ve never been a fried of dynamic languages anyway, but that’s another issue.
2004-10-08 9:14 am
Anonymous
You should try a dynamic language. Stuff like this http://www.lambda-computing.com/products/utils/parser would never be possible with a static language.
Being able to run tests on a library binary without recompiling it (like NUnit does) is also very nice.
2004-10-08 9:36 am
Anonymous
It is not that I don’t know dynamic languages (I know Python) but my main background is Ada95 and C++ in industrial settings with tight time constraints (*cough* realtime *cough*). I always found the runtime behavior of static languages much more predictable but if I would have to write complex parsers or theorem provers I would not hesitate to use a functional language (eg. Haskell). But that’s not were my focus is currently.
2004-10-08 10:59 am
Anonymous
I am not going to suggest dynamic languages for hard realtime applications for you. That would be dangerous at the current state of technology.
But the problem of the garbage collector permanently causing long (several millisecond) breaks can be solved in .NET by using structs, which do not use the garbage collector at all.
A .NET program that is careful to allocate all objects it needs at startup and that avoids temporary object creation by using structs will have very predictable runtime behavior. Since .NET uses a one-time JIT, there is no dynamic recompilation happening like there would be with Java HotSpot.
I found .NET good enough for “soft” real-time stuff like games or audio/animation stuff where you do not want any jerks due to garbage collection, but nobody gets killed if there is an unexpected 50ms break once every few hours.
YMMV, of course.