Infrastructure developed within your organization for its own internal use can take many forms: operating systems, compilers, programming languages, version control systems, platforms for building, testing, and continuous integration, database management systems, application development frameworks, game engines, or utility libraries.
Bespoke infrastructures can also extend to methods for doing work, such as the development process, code reviews, workflows, code style rules, and testing and integration practices.
Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns.
Maintaining large codebases can be a challenging
endeavour. As new libraries, APIs and standards are introduced,
old code is migrated to use them. To provide as clean and
succinct an interface as possible for developers, old APIs are
ideally removed as new ones are introduced. In practice, this
becomes difficult as automatically finding and transforming code
in a semantically correct way can be challenging, particularly as
the size of a codebase increases.
In this paper, we present a real-world implementation of a
system to refactor large C++ codebases efficiently. A combination
of the Clang compiler framework and the MapReduce parallel
processor, ClangMR enables code maintainers to easily and
correctly transform large collections of code. We describe the
motivation behind such a tool, its implementation and then
present our experiences using it in a recent API update with
Google’s C++ codebase.
We discuss Refaster, a tool that uses normal, compilable
before-and-after examples of Java code to specify a Java
refactoring. Refaster has been used successfully by the Java
Core Libraries Team at Google to perform a wide variety
of refactorings across Google’s massive Java codebase. Our
main contribution is that a large class of useful refactorings
can be expressed in pure Java, without a specialized DSL,
while keeping the tool easily accessible to average Java
With a large and rapidly changing codebase,
Google software engineers are constantly paying interest on
various forms of technical debt. Google engineers also make
efforts to pay down that debt, whether through special Fixit
days, or via dedicated teams, variously known as janitors,
cultivators, or demolition experts. We describe several related
efforts to measure and pay down technical debt found in
Google’s BUILD files and associated dead code. We address
debt found in dependency specifications, unbuildable targets,
and unnecessary command line flags. These efforts often expose
other forms of technical debt that must first be managed.