Private WANs are increasingly important to the operation of
enterprises, telecoms, and cloud providers. For example, B4,
Google’s private software-defined WAN, is larger and growing
faster than our connectivity to the public Internet. In this
paper, we present the five-year evolution of B4. We describe
the techniques we employed to incrementally move from
offering best-effort content-copy services to carrier-grade
availability, while concurrently scaling B4 to accommodate
100x more traffic. Our key challenge is balancing the tension
introduced by hierarchy required for scalability, the partitioning
required for availability, and the capacity asymmetry
inherent to the construction and operation of any large-scale
network. We discuss our approach to managing this tension:
i) we design a custom hierarchical network topology for both
horizontal and vertical software scaling, ii) we manage inherent
capacity asymmetry in hierarchical topologies using
a novel traffic engineering algorithm without packet encapsulation,
and iii) we re-architect switch forwarding rules
via two-stage matching/hashing to deal with asymmetric
network failures at scale.
What is this, and who’s it for?
§ Lessons learned from the trenches building distributed systems for 8+ years at Cloudera and in
open source communities.
The large-scale monitoring of computer users’ software
activities has become commonplace, e.g., for application
telemetry, error reporting, or demographic profiling. This
paper describes a principled systems architecture—Encode,
Shuffle, Analyze (ESA)—for performing such monitoring
with high utility while also protecting user privacy. The ESA
design, and its PROCHLO implementation, are informed by
our practical experiences with an existing, large deployment
of privacy-preserving software monitoring.
With ESA, the privacy of monitored users’ data is guaranteed
by its processing in a three-step pipeline. First, the data
is encoded to control scope, granularity, and randomness.
Second, the encoded data is collected in batches subject to
a randomized threshold, and blindly shuffled, to break linkability
and to ensure that individual data items get “lost in the
crowd” of the batch. Third, the anonymous, shuffled data is
analyzed by a specific analysis engine that further prevents
statistical inference attacks on analysis results.
ESA extends existing best-practice methods for sensitivedata
analytics, by using cryptography and statistical techniques
to make explicit how data is elided and reduced in
precision, how only common-enough, anonymous data is analyzed,
and how this is done for only specific, permitted purposes.
As a result, ESA remains compatible with the established
workflows of traditional database analysis.
Strong privacy guarantees, including differential privacy,
can be established at each processing step to defend
against malice or compromise at one or more of those steps.
PROCHLO develops new techniques to harden those steps,
including the Stash Shuffle, a novel scalable and efficient
oblivious-shuffling algorithm based on Intel’s SGX, and new
applications of cryptographic secret sharing and blinding.
We describe ESA and PROCHLO, as well as experiments
that validate their ability to balance utility and privacy.
Abstract. Most of today’s Internet applications generate vast amounts
of data (typically, in the form of event logs) that needs to be processed
and analyzed for detailed reporting, enhancing user experience and increasing
monetization. In this paper, we describe the architecture of
Ubiq, a geographically distributed framework for processing continuously
growing log files in real time with high scalability, high availability and
low latency. The Ubiq framework fully tolerates infrastructure degradation
and data center-level outages without any manual intervention. It
also guarantees exactly-once semantics for application pipelines to process
logs as a collection of multiple events. Ubiq has been in production
for Google’s advertising system for many years and has served as a critical
log processing framework for several dozen pipelines. Our production
deployment demonstrates linear scalability with machine resources, extremely
high availability even with underlying infrastructure failures, and
an end-to-end latency of under a minute.
In 1913, Scottish physiologist John Scott Haldane proposed the idea of bringing a caged canary into a mine to detect dangerous gases. More than 100 years later, Haldane’s canary-in-the-coal-mine approach is also applied in software testing.
In this article, the term canarying refers to a partial and time-limited deployment of a change in a service, followed by an evaluation of whether the service change is safe. The production change process may then roll forward, roll back, alert a human, or do something else. Effective canarying involves many decisions—for example, how to deploy the partial service change or choose meaningful metrics—and deserves a separate discussion.
Google has deployed a shared centralized service called CAS (Canary Analysis Service) that offers automatic (and often autoconfigured) analysis of key metrics during a production change. CAS is used to analyze new versions of binaries, configuration changes, data-set changes, and other production changes. CAS evaluates hundreds of thousands of production changes every day at Google.
Production systems at Google consist of a constellation of microservices1 that collectively issue O(1010) Remote Procedure Calls (RPCs) per second. When a Google engineer schedules a production workload2, any RPCs issued or received by that workload are protected with ALTS by default. This automatic, zero-configuration protection is provided by Google’s Application Layer Transport Security (ALTS). In addition to the automatic protections conferred on RPC’s, ALTS also facilitates easy service replication, load balancing, and rescheduling across production machines. This paper describes ALTS and explores its deployment over Google’s production infrastructure.
Google’s employees are spread across the globe, and with job functions
ranging from software engineers to financial analysts, they require a broad
spectrum of technology to get their jobs done. As a result, we manage a fleet
of nearly a quarter-million computers (workstations and laptops) across four
operating systems (macOS, Windows, Linux, and Chrome OS).
Our colleagues often ask how we’re able to manage such a diverse fleet. Do
we have access to unlimited resources? Impose draconian security policies
on users? Shift the maintenance burden to our support staff?
The truth is that the bigger we get, the more we look for ways to increase
efficiency without sacrificing security or user productivity. We scale our
engineering teams by relying on reviewable, repeatable, and automated
backend processes and minimizing GUI-based configuration tools. Using and
developing open-source software saves money and provides us with a level
of flexibility that’s often missing from proprietary software and closed
systems. And we strike a careful balance between user uptime and security
by giving users freedom to get their work done while preventing them from
doing harm, like installing malware or exposing Google data.
This paper describes some of the tools and systems that we use to image,
manage, and secure our varied inventory of workstations and laptops . Some
tools were built by third parties—sometimes with our own modifications to
make them work for us. We also created several tools to meet our own
enterprise needs, often open sourcing them later for wider use. By sharing
this information, we hope to help others navigate some of the challenges
we’ve faced—and ultimately overcame—throughout our enterprise fleet