System76, purveyor of Linux computers, distributions, and now also desktop environments, has just unveiled its latest top-end workstation, but this time, it’s not an x86 machine. They’ve been working together with Ampere to build a workstation based around Ampere’s Altra ARM processors: the Thelio Astra. Phoronix, fine purveyor of Linux-focused benchmarks, were lucky enough to benchmark one, and has more information on the new workstation.
System76 designed the Thelio Astra in collaboration with Ampere Computing. The System76 Thelio Astra makes use of Ampere Altra processors up to the Ampere Altra Max 128-core ARMv8 processor that in turn supports 8-channel DDR4 ECC memory. The Thelio Astra can be configured with up to 512GB of system memory, choice of Ampere Altra processors, up to NVIDIA RTX 6000 Ada Generation graphics, dual 10 Gigabit Ethernet, and up to 16TB of PCIe 4.0 NVMe SSD storage. System76 designed the Thelio Astra ARM64 workstation to be complemented by NVIDIA graphics given the pervasiveness of NVIDIA GPUs/accelerators for artificial intelligence and machine learning workloads.
The Astra is contained within System76’s custom-designed, in-house-manufactured Thelio chassis. Pricing on the System76 Thelio Astra will start out at $3,299 USD with the 64-core Ampere Altra Q64-22 processor, 2 x 32GB of ECC DDR4-3200 memory, 500GB NVMe SSD, and NVIDIA A402 graphics card.
↫ Michael Larabel
This pricing is actually remarkably favourable considering the hardware you’re getting. System76 and its employees have been dropping hints for a while now they were working on an ARM variant of their Thelio workstation, and knowing some of the prices others are asking, I definitely expected the base price to hit $5000, so this is a pleasant surprise. With the Altra processors getting a tiny bit long in the tooth, you do notice some oddities here, specifically the DDR4 RAM instead of the modern DDR5, as well as the lack of PCIe 5.0.
The problem is that while the Altra has a successor in the AmpereOne processor, its availability is quite limited, and most of them probably end up in datacentres and expensive servers for big tech companies. This newer variant does come with DDR5 and PCIe 5.0 support, but doesn’t yet have a lower core count version, so even if it were readily available it might simply push the price too far up. Regardless, the Altra is still a ridiculously powerful processor, and at anywhere between 64 and 128 cores, it’s got power to spare.
The Thelio Astra will be available come 12 November, and while I would perform a considerable number of eyebrow-raising acts to get my hands on one, it’s unlikely System76 will ship one over for a review. Edit: here’s an excellent and detailed reply to our Mastodon account from an owner of an Ampere Altra workstation, highlighting some of the challenges related to your choice of GPU. Required reading if you’re interested in a machine like this.
Most users wouldn’t know what so do with this kind of system, haha.
Obviously it’s not going for regular consumers anyway.
Yeah, they could have bumped up the ram speed, but I don’t know if these CPUs support overclocked “XMP” DDR ram.
“Astra will start out at $3,299 USD with the 64-core Ampere Altra Q64-22 processor, 2 x 32GB of ECC DDR4-3200 memory”
A dual channel configuration doesn’t seem ideal, I’d want to fill all 8 channels. I am curious about the NUMA details. Regular SMP software can be bottlenecked by shared memory constraints at high core counts. This architecture works best with large number of independent tasks running in parallel. I can think of a lot more server applications than end user applications. Things like hosting, very large compile jobs, maybe databases.
What sort of computing tasks would a desktop system need 512gb or RAM for?
Shifu,
512gb / 128 cores is just 4gb per core. This ratio doesn’t seem outrageous. Such high core architectures seem more fitting for data center use cases though. it’s still hard to imagine needing this many cores for “desktop” applications.
Hypothetically someone could write desktop software to take advantage of all these cores, but it almost feels like finding the problem that matches this solution, haha. I don’t know that this setup would be any good at running blender, for example, but it’s worth nothing that functions like geometric nodes and clother/water physics simulations in blender are not GPU accelerated (unfortunately). So hypothetically something like blender might be a good desktop application for such a high number of cores….but that would need to be benchmarked since a lot of SMP applications incur scaling bottlenecks at high core counts.
I can think of many data center applications, but it’s not clear to me why a data center would want hardware in this form factor though.. Maybe a company only needs a single unit and doesn’t need rack mount servers.
I guess it’s a mystery. Probably no-one will ever spec one that high. Thanks for taking the time to reply!
Alfman,
We already have applications for that amount of RAM for end users. Specifically the new large language models (LLMs) have already gone over 300 billion parameters (at half precision that is over 600GB)
Hmm…, it is now 405B: https://ai.meta.com/blog/meta-llama-3-1/
I’m not sure ARM CPU can run it fast enough, but a coupled NPU should be able to utilize this.
(Sorry had to jump in).
sukru,
Yes, LLM needs lots of RAM, but I suspect this particular system is likely already obsolete for such applications. You have a lot of cores, but both the CPU and RAM are older gen.
Yeah, I think the use of dedicated silicon is pretty much mandatory these days. Generic CPUs aren’t as performant, even with 128 cores, This CPU can sport lots of ram, but I think it would be a severe bottleneck for NN. Ram speeds are critical for NN operations with NPU/GPU.
Incidentally I have been playing with llama 3.1 70B….it’s like a 300 baud modem running on my i9-11900k and ryzen 7 CPUs. (I have to make do with older stuff). The GPU version, which is much faster, is not able to run from system ram though (as far as I know). So if I want to use acceleration (which, yes I do), I need to use a model that fits in my GPU’s dedicate memory. The 8B model runs in virtually real time on GPU. If I switch back to CPU to run this smaller model (CUDA_VISIBLE_DEVICES=””) the same model goes back to crawling on the CPU.
Fortunately for us, phoronix.com has benchmarked both the “Ampere Altra” as well as my CPUs, so we know exactly how these compare. Here I’ll look at the “mobile neural network, model inception” tests (same NN models but worth noting the software got updated between tests). …
https://www.phoronix.com/review/system76-thelio-astra/4
https://www.phoronix.com/review/intel-rkl-linux/18
Next we can look at OneDNN Recurrent Neural Network Training…
Suffice it to say, it’s not a great system for neural nets even though the specs at face value might seem to lend themselves to this application.
Not a desktop system, a workstation. I understand most people don’t have experience with them, but its basically a server in a different form factor that is local to you and slightly less insanely loud fan speeds. My first company didn’t have rack mounted servers, but a home built rack of workstations because … I don’t know why. It was a dumb choice then too.
Can it run Crysis?
The short answer is, yes it can. Obviously not as well or as simple as an AMD64 based machine, but it runs.
https://www.jeffgeerling.com/blog/2023/ampere-altra-max-windows-gpus-and-gaming
I wrote it for the meme, but of course someone tried it! Thank you @Morgan for the link, you made my day.
Hold on… Thom missed something HUGE in the summary. Its not just any workstation or desktop. Oh No.
Its “The first official desktop for streamlined autonomous vehicle development, powered by Ampere processing.”
So I guess there are enough engineers working on Autonomous vehicles to need a computer for them to market to? Thats crazy. Anyone know what a real Autonomous vehicle engineer does run, or maybe did run before this thing?
Bill Shooter of Bul,
I imagine autonomous vehicle engineers are working with neural nets like everyone else. Marketing can be application specific, but I don’t see why the hardware would need to be.
High core CPUs are good at server applications, but I predicted long ago GPGPU would replace high core counts for parallel computing applications. Super scalar CPU pipelines scale up poorly by comparison. IMHO the main reason to run a CPU implementation is because a more performant GPU/ASIC implementation hasn’t been built yet. In this case, 128 core CPUs could be helpful.
I’d compare this to crypto mining. which started out running on CPUs, but GPUs blew CPUs out of the water,. And then ASIC miners went on to replaced GPU miners. Even if you had a 10,000 core CPU cluster, it wouldn’t be worth using it to mine crypto since the technology isn’t competitive with specialized hardware. The transistors and energy required to run superscalar pipelines in generic CPUs have opportunity cost that more optimized hardware can put to better use.