Google’s entire codebase: 2 billion lines of code, 86 terabytes

Submitted by Pete 2015-09-19 Google 11 Comments

Interestingly, the entirety of Google’s codebase – from search and maps to YouTube and Google Docs – resides in a monolithic source code repository available to and used by 95% of Google engineers, or about 25,000 users to be exact.
“Without being able to prove it,” a Google engineering manager said, “I’d guess that this is probably the largest single repository in use anywhere in the world.”
All told, Google’s services comprise 2 billion lines of code which, taken together, weigh in at 86 terabytes.

Fascinating.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

11 Comments

2015-09-19 5:10 pm
hussam
This likely includes the commit history which means every version of each file.

2015-09-19 6:13 pm
boofar
Unless their code has, on average, 43000 bytes per line that is…

2015-09-20 4:06 pm
avgalen
…and the average developer (or person with access) has written about 100.000 lines of code.
How this article past basic fact checking … let me guess, nobody bothered because… well big numbers are hard

2015-09-19 7:04 pm
vivainio
Somehow I doubt google developers have 86TB git checkouts on their machines, so this is unlikely a “monolithic repository”.

2015-09-19 11:21 pm
kristoph
They use perforce not git. Either way you can just checkout the HEAD ( –depth 1 on git ) rather than the whole 86TB of historical stuff.
Perforce – if I recall correctly – also allows you to check out individual folders so although they have ‘one repository’ it’s logically subdivided much as you can have one ‘organization’ on github with dozens of individual repositories ( we have 50+ at our company ).
What’s way more bizarre to me is that the way their repository is organized a single commit can ( and often does ) prevent dozens of projects from being deployed.

2015-09-20 11:07 am
Vanders
They use perforce…
The poor bastards. I’ll try to remember to give a hug to the next Google engineer that I meet.
2015-09-21 1:45 pm
Savior
Apparently, the numbers were quite different 5 years ago: http://www.perforce.com/blog/110607/how-do-they-do-it-googles-one-s…

2015-09-20 7:59 am
Phil2
Doubt it.
For example, right now they are talking about merging Chromium and Blink into one repository (next week):
https://groups.google.com/a/chromium.org/d/topic/blink-dev/FB5dGFtq4…
2015-09-20 3:22 pm
Wootery
I’m half way through watching the video ( a good place is to start https://youtu.be/W71BTkUbdqE?t=16m20s ) and there seems to be a lot of straw-man reasoning about the ad-hoc multi-repo approach. Google themselves created a multi-repo-sync tool: https://source.android.com/source/using-repo.html
See also: https://codingkilledthecat.wordpress.com/2012/04/28/why-your-company…
It seems to me that the correct way to handle this stuff is with good tooling for synchronising between different repos, not monolithic monsters.
2015-09-21 5:33 am
sukru
There are more details in this presentation:
https://www.youtube.com/watch?t=7&v=W71BTkUbdqE
Google really uses a monolithic code base for most of the stuff. Externally shared ones like Chromium, and Android are separate, but the main code is in one place.
We see a “cloud” based view of the latest version of the repository. While it is possible to have branches, that is rare, any real code change will be to the “head”. The presentation goes into more details, but basically we have a system called TAP that makes sure you do not break anybody else’s code.
It is really nice, and works well, let me give a concrete example.
Recently I was working on a tool, that uses a library to connect to a service. I would benefit from recent changes to the service, and instead of waiting the library owner, or the service owner to adapt them, I put together a patch, got it reviewed, checked in and got my use case done.
However it inadvertently broke somebody else’s code, and we missed that during regular tests. While I was getting together what was wrong, a (fifth?) party came in, and sent the fix.
Everything was better. The new service features were adopted, bugs were fixed (even some older things got optimized), tests were updated, and we went on our ways.
Edited 2015-09-21 05:34 UTC
2015-09-22 12:30 pm
DeepThought
and maybe start work on Friday .. 🙂