“As it invites the world to play in a mysterious sandbox it likes to call ‘Caffeine’, Google is testing more than just a ‘next-generation’ search infrastructure. It’s testing at least a portion of a revamped software architecture that will likely underpin all of its online applications for years to come. Speaking with The Reg, ueber-Googler Matt Cutts confirms that the company’s new Caffeine search infrastructure is built atop a complete overhaul of the company’s custom-built Google File System, a project two years in the making. At least informally, Google refers to this file system redux as GFS2.”
Of course, this is strongly related to Google’s previous talking on “instantaneous” loading and making the Web faster, also through collaboration:
http://www.youtube.com/watch?v=IWWBnJEsUtU
But could it be a coincidence that the focus on “ultra-low latency” with GFS2 happens simultaneously with Google Waves (see wave.google.com and fast-forward to 10:30 and 35:45) and a possible upcoming (highly necessary) improvement to simultaneously edit in Google Docs?
(Current simultaneous editing in Google Docs results too frequently in a rather aggressive system-invoked undo of the work you’ve done when someone else has done editing in a nearby section of the Doc. Sometimes this erases up to 15 minutes of writing.)
All in all, I’m continuing to be amazed with both Google’s productivity and Google’s reasoning, which somehow never seems to go astray. What I am curious about is how Google manage to move everything to GFS2 without having to temporally shut down some of their services (most notably Search).
GFS may be advanced, but it’s just a filesystem.
Especially search is quiet static, indexes of the web are generated every few hours I think, you just bring up other servers which use the new system and point visitors at the new servers.
You can run search on the old and new system at the same time.
Turn off/disable the servers running the older system and update them and let them join in with the new.
I think the biggest problem for search is you need twice as much diskspace and memory space during migration, but most they have a lot more data in the form of video’s and stuff and they also do updates of the index regularly, maybe they also run them side by side at the time of the update. I wouldn’t be surprised.
The other services are much harder to move around.
Then again, you have to remember they have whole datacenters running as seperate working entities when only data is copied, so you could also do the above trick on a per datacenter-basis. Updating one and moving users over to it and then updating an other.
Actually I just noticed the GFS2 sandbox runs alongside the current Search already, so as soon as they give a “go” on the new FS, they will probably extrapolate that to the other servers during the infrastructure update.