Why Google Stores Billions of Lines of Code in a Single Repository

EARLY GOOGLE EMPLOYEES decided to work with a
shared codebase managed through a centralized
source control system. This approach has served
Google well for more than 16 years, and today the vast
majority of Google’s software assets continues to be
stored in a single, shared repository. Meanwhile, the
number of Google software developers has steadily
increased, and the size of the Google codebase
has grown exponentially (see Figure 1). As a result,
the technology used to host the codebase has also
evolved significantly.

Source: http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext

How Developers Search for Code: A Case Study

With the advent of large code repositories and sophisticated
search capabilities, code search is increasingly becoming a
key software development activity. In this work we shed
some light into how developers search for code through a case
study performed at Google, using a combination of survey
and log-analysis methodologies. Our study provides insights
into what developers are doing and trying to learn when performing
a search, search scope, query properties, and what a
search session under different contexts usually entails. Our
results indicate that programmers search for code very frequently,
conducting an average of five search sessions with
12 total queries each workday. The search queries are often
targeted at a particular code location and programmers are
typically looking for code with which they are somewhat familiar.
Further, programmers are generally seeking answers
to questions about how to use an API, what code does, why
something is failing, or where code is located.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43835.pdf

Large-Scale Automated Refactoring Using ClangMR

Maintaining large codebases can be a challenging
endeavour. As new libraries, APIs and standards are introduced,
old code is migrated to use them. To provide as clean and
succinct an interface as possible for developers, old APIs are
ideally removed as new ones are introduced. In practice, this
becomes difficult as automatically finding and transforming code
in a semantically correct way can be challenging, particularly as
the size of a codebase increases.
In this paper, we present a real-world implementation of a
system to refactor large C++ codebases efficiently. A combination
of the Clang compiler framework and the MapReduce parallel
processor, ClangMR enables code maintainers to easily and
correctly transform large collections of code. We describe the
motivation behind such a tool, its implementation and then
present our experiences using it in a recent API update with
Google’s C++ codebase.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41342.pdf

Scalable, Example-Based Refactorings with Refaster

We discuss Refaster, a tool that uses normal, compilable
before-and-after examples of Java code to specify a Java
refactoring. Refaster has been used successfully by the Java
Core Libraries Team at Google to perform a wide variety
of refactorings across Google’s massive Java codebase. Our
main contribution is that a large class of useful refactorings
can be expressed in pure Java, without a specialized DSL,
while keeping the tool easily accessible to average Java
developers

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41876.pdf