In a recent conference presentation, Google’s Craig Nevill-Manning notes that cheap and fast hardware is the key to Google’s success. The software, he says, is written to assume that the hardware will fail, and work around that. That way, the company can use commodity PC hardware and not worry when it fails–just replace it.
I wonder if Google’s experience translates to other situations or environments that require large volumes of computation. Is it faster and cheaper to cluster many low-cost PCs together for performance, or use fewer, but more expensive (and faster) high-end models? Could a single motherboard with multiple (low cost) CPUs peform better in some situations than a single high-end chip? (of course, the software would have to take advantage of the multiple CPU configuration).
I think the point is fairly simple. In times where you will need anyho to support clusters (and variants) wherther you use the latest and gratest or not, it is cheaper to use 30 low end systems, rather then 10 highend systems (or whatever the numbers are). I dont see this as anything rather suprising.
Of course, if you wont need the cluster type infrastructure, then it gets more complicated (code complexity, network can become bottle neck, etc…) but that isnt google’s case.
Well, if you know what you’re doing, you can get away with this kind of strategy, but when it gets to be a huge operation it will fall apart. The manpower will always be short of the requirements.
Google is a “pretty large operation”, and does not seem to be falling apart.
>> but when it gets to be a huge
>> operation it will fall apart.
If its google, its pretty huge. From the article:
” one full day of Google use on a server is the equivalent of 40 machine years, Nevill-Manning said.”
More than 5000 servers (Google) is by no means something small.
Well, if you know what you’re doing, you can get away with this kind of strategy, but when it gets to be a huge operation it will fall apart. The manpower will always be short of the requirements.
The true nature of Google is that the vast majority of the software they are using was developed in-house, from scratch, under the assumption that it would be parallelized across several nodes of commodity hardware, ever since Google started as BackRub, a collection of Java and Python scripts, which were in turn scrapped due to scalability issues, thus Google is born.
Unfortunately, many use Google as evidence that you don’t need a high performance central database server for a database backed web site. This can be the case, but keep in mind that Google today is the culmination of 9 years of work. The complete lack of centralization within their system has been a programming requirement since day one. Google should *NOT* be looked at as evidence that anyone can pull the same thing off. Also, keep in mind that the database requirements of Google are very different from typical database driven web sites. Users do not directly modify the database, only query it. For read only operations, it *is* possible to parallelize across several lower powered systems instead of having a central database server. However, if users are directly modifying the database, the computational costs of replication will destroy any advantage gained through parallelism.
Bottom line: For a typical database driven web site, 5 dual 3GHz Xeon systems are not going to equal one quad 1.28GHz UltraSPARC IIIi V440.
I wonder if Google’s experience translates to other situations or environments that require large volumes of computation. Is it faster and cheaper to cluster many low-cost PCs together for performance, or use fewer, but more expensive (and faster) high-end models?
It depends on if the software you intend to use is designed to be parallelized. Having multiple cheap and fast commodity systems in a cluster is wonderful for MPI or PVM enabled programs, or for multiple web/application servers which talk to a database backend.
Unfortunately, as I mentioned before, for database use cheap and fast commodity hardware is rarely a solution for a high volume databse backed web site. As I mentioned in my previous post, 5 dual 3GHz Xeon systems running Linux and Oracle 9i with replication will have merely a fraction of the database performance of a single V440 running Oracle 9i for operations which require replication (and yes, the specific numbers given in this example should lead you to believe this was verified this empirically)
So, basically it comes down to the nature of the web site. If you are rarely modifying the contents of your database, multiple cheap and fast nodes will have significantly better performance than a central server. However, if you are constantly modifying the contents of your database, then more powerful central servers are the way to go.
Bascule said:
“….If you are rarely modifying the contents of your database, multiple cheap and fast nodes will have significantly better performance than a central server. However, if you are constantly modifying the contents of your database, then more powerful central servers are the way to go.”
But Google constantly updates its database. I think Google can tolerate a relatively high failure rate because the individual data points are not mission-critical (how would you even know if there were a slight inaccuracy in your search results?), and because due to the nature of their system data is constantly refreshed anyway. Cheap, transient data need only rely on cheap hardware.
“So, basically it comes down to the nature of the web site. If you are rarely modifying the contents of your database, multiple cheap and fast nodes will have significantly better performance than a central server. However, if you are constantly modifying the contents of your database, then more powerful central servers are the way to go.”
Which explains the difference between Google (massive Lintel server clustering) and eBay (just a couple of huge SunFire servers, last I heard).
But Google constantly updates its database. I think Google can tolerate a relatively high failure rate because the individual data points are not mission-critical (how would you even know if there were a slight inaccuracy in your search results?), and because due to the nature of their system data is constantly refreshed anyway. Cheap, transient data need only rely on cheap hardware.
Well, “constantly updates its database” comes with a few caveats:
* The updates are periodic; Google doesn’t have a constant stream of database modifications coming from its userbase (i.e pages don’t show up in Google the instant you submit them) This allows for much better resource utilization during updates.
* The Google database is highly decentralized. Because of this, the resource usage of replication on individual nodes is kept to a minimum, i.e. the entire database need not be replicated across every node.
Google has created an extremely specialized, high performance database with redundancy and decentralization as two of the key design goals. Unfortunately, this approach does not work for most database driven web sites (e.g. eBay, as mentioned earlier) which take a constant stream of database modifications by their userbase and thus need a traditional RDBMS which can be queried with SQL or otherwise.
“For a typical database driven web site, 5 dual 3GHz Xeon systems are not going to equal one quad 1.28GHz UltraSPARC IIIi V440.”
My first though was that a single BiXEON 3Ghz (or quad ?) will outperfom a quad 1.28ghz UIII if they have almost the same IO subsystem
the nazi system has been tried and done with. For reference, Sabreman’s post is humourous and witty, and should be treated as such. WTF.
Google to Steve Jobs….Hit the road Mac !!
-sabreman
I nearly laughed my bag off
Damn, I thought this was an article bashing dongles!
My first though was that a single BiXEON 3Ghz (or quad ?) will outperfom a quad 1.28ghz UIII if they have almost the same IO subsystem
For Oracle performance the V440 will beat the Xeon hands down.
>>Well, if you know what you’re doing, you can get away with this kind of strategy, but when it gets to be a huge operation it will fall apart. The manpower will always be short of the requirements. <<
Google is not exactly a tiny operation. Yahoo is a huge operation, and I think yahoo runs mostly on freebsd.
I think google runs mostly on linux, and yahoo runs mostly on freebsd. BTW: anybody notice that redhat is up *another* 20% today, while sunw drops another 4%? Oh well, I hope sunw enjoys their relationship with scox.