Interestingly, the entirety of Google’s codebase – from search and maps to YouTube and Google Docs – resides in a monolithic source code repository available to and used by 95% of Google engineers, or about 25,000 users to be exact.
“Without being able to prove it,” a Google engineering manager said, “I’d guess that this is probably the largest single repository in use anywhere in the world.”
All told, Google’s services comprise 2 billion lines of code which, taken together, weigh in at 86 terabytes.
Fascinating.
This likely includes the commit history which means every version of each file.
Unless their code has, on average, 43000 bytes per line that is…
…and the average developer (or person with access) has written about 100.000 lines of code.
How this article past basic fact checking … let me guess, nobody bothered because… well big numbers are hard
Somehow I doubt google developers have 86TB git checkouts on their machines, so this is unlikely a “monolithic repository”.
They use perforce not git. Either way you can just checkout the HEAD ( –depth 1 on git ) rather than the whole 86TB of historical stuff.
Perforce – if I recall correctly – also allows you to check out individual folders so although they have ‘one repository’ it’s logically subdivided much as you can have one ‘organization’ on github with dozens of individual repositories ( we have 50+ at our company ).
What’s way more bizarre to me is that the way their repository is organized a single commit can ( and often does ) prevent dozens of projects from being deployed.
The poor bastards. I’ll try to remember to give a hug to the next Google engineer that I meet.
Apparently, the numbers were quite different 5 years ago: http://www.perforce.com/blog/110607/how-do-they-do-it-googles-one-s…
Doubt it.
For example, right now they are talking about merging Chromium and Blink into one repository (next week):
https://groups.google.com/a/chromium.org/d/topic/blink-dev/FB5dGFtq4…
I’m half way through watching the video ( a good place is to start https://youtu.be/W71BTkUbdqE?t=16m20s ) and there seems to be a lot of straw-man reasoning about the ad-hoc multi-repo approach. Google themselves created a multi-repo-sync tool: https://source.android.com/source/using-repo.html
See also: https://codingkilledthecat.wordpress.com/2012/04/28/why-your-company…
It seems to me that the correct way to handle this stuff is with good tooling for synchronising between different repos, not monolithic monsters.
There are more details in this presentation:
https://www.youtube.com/watch?t=7&v=W71BTkUbdqE
Google really uses a monolithic code base for most of the stuff. Externally shared ones like Chromium, and Android are separate, but the main code is in one place.
We see a “cloud” based view of the latest version of the repository. While it is possible to have branches, that is rare, any real code change will be to the “head”. The presentation goes into more details, but basically we have a system called TAP that makes sure you do not break anybody else’s code.
It is really nice, and works well, let me give a concrete example.
Recently I was working on a tool, that uses a library to connect to a service. I would benefit from recent changes to the service, and instead of waiting the library owner, or the service owner to adapt them, I put together a patch, got it reviewed, checked in and got my use case done.
However it inadvertently broke somebody else’s code, and we missed that during regular tests. While I was getting together what was wrong, a (fifth?) party came in, and sent the fix.
Everything was better. The new service features were adopted, bugs were fixed (even some older things got optimized), tests were updated, and we went on our ways.
Edited 2015-09-21 05:34 UTC
and maybe start work on Friday .. 🙂