Terrasoft solution announced today the port on Linux PPC64 of the InfiniBand technology support. This is an important step that will make happy all those institutes working on Apple Xserve clusters and Linux.
Terrasoft solution announced today the port on Linux PPC64 of the InfiniBand technology support. This is an important step that will make happy all those institutes working on Apple Xserve clusters and Linux.
While we couldn’t be happier with the hardware we’ve purchased (after seeing Dell clusters in our server room with 30% of their nodes down due to hardware problems) we’ve been having nonstop problems with OS X, the latest being that during runs of our model networking completely flakes out on certain systems (to the point that we can’t ping those systems or access any services) until we deconfigure and reconfigure the network interfaces (via serial console). This is the latest in a slew of problems I’ve been having with 10.3 server, and more and more it’s seeming like Linux is the way to go.
I’m just curious if there are infiniband drivers for MacOSX available? And what about LinuxPPC 32 bit?
Thanks and best regards,
Anton
Yes, there are (drivers for OSX). See Virginia Tech.
We are running a cluster of one hundred Xserves for scientific simulations, running with Pantrher server without any problems after more than 8 months operating 24/7. The cluster was very easy to set up, configure, and it is extremely easy to maintain. We are also testing Xgrid, it looks really great, we are looking forward for Tiger which will integrate Xgrid.
Although Linux is also a great solution for clustering, its doesn’t offer this “out of the box” solution as Panther does, and it needs much more time and people to set up and maintain.
We are very happy with the solution that we have, i guess that your situation is an exeption, maybe you have a specific problem somewhere, i don’t even think that it may be related to the operating system….
Anyway, good luck!!!!!
Hmm..I can’t seem to find any links to concrete products for sale using infiniband. How much are they? Any PCIe cards yet? How much does a switch cost?
Check out http://www.voltaire.com. They have some pretty hefty IBA switches there.
Man I hope the open-source IBA driver suite works out (http://infiniband.sourceforge.net/), because current IBA drivers are ____HUGE____, buggy, and are a pain in the ass to install.
http://www.openib.org/ for 2.6 kernels. I hope one of these two projects have some code that migrates to the kernel source tree.
perhaps you did’t notice on yellowdog site that their product Y – HPC includes MPI-CH and Maui plus Torque schedulers which together are equivalent to XGrid (if not better being a standard de facto in HPC). Furthermore they ship also System Imager that enables to install only one node and clone it to all the other nodes in your cluster. More out of the box than this …
Cheers
If you are using Infiniband with your OS X cluster you may want to check out this company:
http://www.small-tree.com/index.html
We are running a cluster of one hundred Xserves for scientific simulations, running with Pantrher server without any problems after more than 8 months operating 24/7. The cluster was very easy to set up, configure, and it is extremely easy to maintain.
I’ve experienced the exact opposite. Deploying Panther server has been quite a headache. So much is dependent on everything placed in the initial configuration being correct the first time. For example, I wasn’t able to get KDC to start, and I discovered the reason was that I had configured the first network port for Internet connectivity and the second for the internal network. Doing this will break KDC as KDC expects it will provide Kerberos service to the network on the first jack and not the second. A lot of this information is hardcoded into the system and can only be corrected by blowing away the NetInfo, OpenLDAP, and Password Server configuration and removing the .AppleSetupDone file and restarting the configuration from scratch.
We are also testing Xgrid, it looks really great, we are looking forward for Tiger which will integrate Xgrid.
I’d love to use Xgrid, but for MPI applications it’s tied to MacMPI, which is not only a poor performer, but is buggy and sits on top of the Carbon APIs as it was originally designed for Mac OS 8.1.
Although Linux is also a great solution for clustering, its doesn’t offer this “out of the box” solution as Panther does, and it needs much more time and people to set up and maintain.
I can deploy a Linux cluster in less than a week, and I’ve been struggling with OS X for over two months now. Creating a system image for NetBoot is a nightmarish headache. Getting Open Directory working has been a horrible nightmare full of bizarre caveats. For example, if the 501 account you create has a password longer than 8 characters, OpenDirectory will silently fail and fall back on Crypt Passwords for new accounts you create, which, of course, will only function on the local system and not be shared via LDAP.
We are very happy with the solution that we have, i guess that your situation is an exeption, maybe you have a specific problem somewhere, i don’t even think that it may be related to the operating system….
Given that everything non-Apple we are using (LAM MPI and our modelling application) runs in process context, a complete failure of networking on the system would indicate something inherently wrong with OS X. Something running in process context should not be able to disable system networking entirely.
I do love the admin tools… they’re some of the best I’ve ever seen. But Panther Server is still buggy and immature. I look forward to Tiger as well and hope that it fixes the problems I’ve been encountering.
What is Apple’s position on your situation?
Can we assume you purchased an enterprise support contract?
I spent probably at least a total of 10 hours on the phone with Apple Enterprise Server Support. They really can’t offer me anything on our current problem except to tell us to pay for on-site support.
Interesting problems, you should push Apple harder because I have yet to see this scale of issues that you are describing with OS X Server. (I do agree with your assessment of Open Directory as it can be very problematic) How about getting an Apple Systems Engineer out to look at? I have set up several systems like Hakim’s listed above and have yet to run into as many serious issues as you. Drop me a note offline if you have questions.