Nvidia and partners are offering new “personal supercomputers” for under $10,000. Nvidia, working with several partners, has developed the Tesla Personal Supercomputer, powered by a graphics processing unit based on Nvidia’s Cuda parallel computing architecture. Computers using the Tesla C1060 GPU processor will have 250 times the processing power of a typical PC workstation, enabling researchers to run complicated simulations, experiments and number crunching without sharing a supercomputing cluster.
More technical specifications are available on their website.
The Nvidia’s personal super computer has 3 or 4 Tesla C1060 cards which are basically high-end GPUs and each of these cards have 240 streaming processor cores. Each card comes with dedicated 4GB of high speed memory. There will also be a Quad core AMD or a Quad Intel processor per computer.
Supported Platforms:
- Microsoft® Windows® XP 64-bit and 32-bit (64-bit recommended)
- Linux® 64-bit and 32-bit (64-bit recommended)
- Red Hat Enterprise Linux 4 and 5
- SUSE 10.1, 10.2 and 10.3
Development Environment:
The development tools include a C language environment complete with a compiler, debugger, profiler and an emulation mode for debugging. Standard numerical libraries (FFT and BLAS) for High-performance computing are also included.
Here I was hoping there was some bizarre explaination for the name.
Fixed the title. Thanks.
For what operations? How big and how fast is the the on-card memory on a C1060? What programming models does the C1060 support?
While I have no doubt it’ll do vector math much faster than a general purpose CPU, it won’t help much if you’re processing a large data set as the PCIe bus will become the (very small) bottle-neck.
Edited 2008-11-19 22:44 UTC
GPU units really shine in huge SIMD problems where you have a very large dataset and need to perform the same operation on each element. Examples would be simulations, visualization, medical imaging, etc.
While the PCIe bus is usually the limiting factor, it can be dealt with. Usually by transferring very large chunks of data over at once (hundreds of megabytes to several gigabytes), performing the computation on the GPU, and tranfering the results back, rinse, repeat. Even with the bandwidth limitations, the computational gains are so great, the end result is usually orders of magnitude faster.
4GB, 512-bit GDDR3, 800MHz, 102 GB/sec.
Since at it’s heart it’s just a GPU, the programming model is shader based. GLSL or HSL could both be used (the OpenGL and DirectX shading languages). However, NVidia’s CUDA toolkit is also available (and the preferred method) which is essentially an extension to C designed with a kernel type processing model in mind (GPU kernel, not OS kernel).
Edited 2008-11-19 23:25 UTC
[/q]
I know, but I’d be interested in seeing bench marks of high-level operations (I.e. how fast can it reduce a matrix of n*n compared to a CPU?)
[/q]
Yes, that’s why I was interested in how much on-board memory it has.
[/q]
Ah, so you can’t take your existing Fortran and recompile chunks of it for the C1060?
Interestingly, the CUDA SDK comes with a BLAS library implemented on the GPU. They also have an FFT library as well.
It’s not as simple as a re-compile, no. And really, you wouldn’t want it to be. When, using a GPU to accellerate processing, it’s not just another processor. It has a very different memory model and a very differnt processing model. In order to really take advantage of and best leverage the GPU architecture, the code needs to be structured with that in mind.
Say, for instance, you have 500 matricies of size 500×500 and you needed to use these matricies to solve some A*x=b equations. On the CPU, you would loop though all 500, solving one at a time.
While this will work on the GPU, it’s not an efficient way to use it. On the GPU, you would copy all 500 to the GPU memory, run a single solver on all 500 simultaneously, and then copy the results back.
Specialized hardware generally requires specialized programming to fully exploit it.
Which, in a round-about way, brings me to the point: while these cards look very nice and clearly have a roll to play in specialised applications such as real-time medical imaging, they are not a “drop in” replacement for a proper cluster. If you write your code to use one of these cards you will find yourself tied to nVidia in the future, with perhaps no opportunity to run your code on a faster machine in the future should the need arise.
If you write your code using say, MPI on Fortran, you can pretty much expect your code to run five or ten years from now, even if it’s running on a totally different cluster.
CUDA is a programming model, mostly based on super-threading, data streaming and data parallelism.
It is being ported to the CPU, and later (via Apple’s OpenCL, which is mostly CUDA-based) to ATI’s GPUs (although the ATI parts have poorer programmability).
Basically, once you map your algorithm to CUDA, you should be able to run it on either the CPU or GPU in the near future.
Alas, if you already have developed your code on OpenMP and it works for you.. as they say, if it ain’t broken…
However, where the CUDA boards shine is on their price per flop and power per flop. So they are very, very, very attractive.
Finally a computer that might be able to play Crysis on full settings.
Cuda is much like C actually, and very easy to handle imo. The truly hard part is designing your algorithms to use the heavily distributed computational power, and memory access/control can be tricky (but that’s true anything heh ^_^). You also cannot use device functions from the device, which effectively disable all recursive programming and a handful of usual algorithms.
But honestly the benefits are so great on some applications it’s almost crazy. Check it out, almost everybody has a Geforce 8+ somewhere and Cuda is available on both Linux and Windows Matlab has plugins for it too iirc, and it’s so easy to set up, one shouldn’t deprive himself of such resources
Does it have a graphics card ?
Like if I want to use it with another OS (say, Haiku :p)…
Free to download, yes. Open source, no. The hardware has a certain version of the CUDA API it supports so I’m assuming it would also require the nvidia drivers. This would in turn limit support to Winindows XP / Vista x86/x64, Linux x86/x64, Solaris x86/x64, OSX x86/x64, and FreeBSD x86. However, AFAIK, only the Windows, Linux, and OSX drivers support the CUDA API.
“Vista Capable”?
Combine this news with the following:
http://news.bbc.co.uk/1/hi/technology/6425975.stm
and you may just have the beginnings of a much more devolved understanding of the Academy. The increase in open access journals and datasets and the fact that in the UK at least, on average 80% of PhDs do not make it into formal university research jobs, could provide the basis for an accelerated evolution of tertiary education, in which the dichotomy between Town and Gown ultimately dissolves.
I hope so – those pundits that bewail the furtherance of knowledge by those beyond a self-maintaining elite (‘The Cult of the Amateur’ by Andrew Keen, for e.g.) in my very humble opinion need to be put firmly in their place. The printing press did not mean literacy for the few only, and neither should that ever-unfolding digital broadsheet, the Internet.
Well how about a 72 core Mips workstation:
http://sicortex.com/products/deskside_development_system
now that is something close to a supercomputer on your desk.
With that said, the above posts are correct, Nvidia’s stream processors do outperform traditional CPUs on certain data parallel tasks. The key to achieving that is by coding specially for it.
Crysis should finally be able to run at 60fps?
One again, the consumer is the big loser, in the same way the customer had to choose between SLI and Crossfire when buying a motherboard, the user must choose between programming APIs for those cards.
I believe that Apple may be making a API passthrough, but it would still be sad if the only open GPU API connector between Nvidia and ATA–is proprietary in itself.
OpenSouce people: Motivate and create a open standard for number-crunching on a video card, or be left out in the cold, having to make difficult decisions.
We are seeing a major revolution in computers, what I feel is the biggest change in computers since the CD-ROM.
A suggestion for Nvidia: Get 45nm parts, and pull the trigger on Intel.
A warning to Intel: be very afraid, and get that 6-Banger out.
Once again, we have someone posting from their parent’s basement giving directions to a whole industry on what they have to do.
Arm chair quarterbacking is sooooo much easier, than actually doing.
How is the capacity of utilizing a few gigaflops currently present in a lot of desktops a “bad thing for customers?”