Sherlock’s statement is most often quoted to imply that uncommon
scenarios can all be explained away by reason and logic. This is missing
the point. The quote’s power is in the elimination of the impossible
before engaging in such reasoning. The present authors seek to expose
a similar misapplication of methodology as it exists throughout information
security and offer a framework by which to elevate the common Watson.
The large-scale monitoring of computer users’ software
activities has become commonplace, e.g., for application
telemetry, error reporting, or demographic profiling. This
paper describes a principled systems architecture—Encode,
Shuffle, Analyze (ESA)—for performing such monitoring
with high utility while also protecting user privacy. The ESA
design, and its PROCHLO implementation, are informed by
our practical experiences with an existing, large deployment
of privacy-preserving software monitoring.
With ESA, the privacy of monitored users’ data is guaranteed
by its processing in a three-step pipeline. First, the data
is encoded to control scope, granularity, and randomness.
Second, the encoded data is collected in batches subject to
a randomized threshold, and blindly shuffled, to break linkability
and to ensure that individual data items get “lost in the
crowd” of the batch. Third, the anonymous, shuffled data is
analyzed by a specific analysis engine that further prevents
statistical inference attacks on analysis results.
ESA extends existing best-practice methods for sensitivedata
analytics, by using cryptography and statistical techniques
to make explicit how data is elided and reduced in
precision, how only common-enough, anonymous data is analyzed,
and how this is done for only specific, permitted purposes.
As a result, ESA remains compatible with the established
workflows of traditional database analysis.
Strong privacy guarantees, including differential privacy,
can be established at each processing step to defend
against malice or compromise at one or more of those steps.
PROCHLO develops new techniques to harden those steps,
including the Stash Shuffle, a novel scalable and efficient
oblivious-shuffling algorithm based on Intel’s SGX, and new
applications of cryptographic secret sharing and blinding.
We describe ESA and PROCHLO, as well as experiments
that validate their ability to balance utility and privacy.
Abstract. Most of today’s Internet applications generate vast amounts
of data (typically, in the form of event logs) that needs to be processed
and analyzed for detailed reporting, enhancing user experience and increasing
monetization. In this paper, we describe the architecture of
Ubiq, a geographically distributed framework for processing continuously
growing log files in real time with high scalability, high availability and
low latency. The Ubiq framework fully tolerates infrastructure degradation
and data center-level outages without any manual intervention. It
also guarantees exactly-once semantics for application pipelines to process
logs as a collection of multiple events. Ubiq has been in production
for Google’s advertising system for many years and has served as a critical
log processing framework for several dozen pipelines. Our production
deployment demonstrates linear scalability with machine resources, extremely
high availability even with underlying infrastructure failures, and
an end-to-end latency of under a minute.
The OAuth 2.0 protocol is one of the most widely deployed authorization/single sign-on (SSO) protocols
and also serves as the foundation for the new SSO standard OpenID Connect. Despite the popularity
of OAuth, so far analysis efforts were mostly targeted at finding bugs in specific implementations and
were based on formal models which abstract from many web features or did not provide a formal treatment
In this paper, we carry out the first extensive formal analysis of the OAuth 2.0 standard in an expressive
web model. Our analysis aims at establishing strong authorization, authentication, and session
integrity guarantees, for which we provide formal definitions. In our formal analysis, all four OAuth
grant types (authorization code grant, implicit grant, resource owner password credentials grant, and
the client credentials grant) are covered. They may even run simultaneously in the same and different
relying parties and identity providers, where malicious relying parties, identity providers, and browsers
are considered as well. Our modeling and analysis of the OAuth 2.0 standard assumes that security
recommendations and best practices are followed in order to avoid obvious and known attacks.
When proving the security of OAuth in our model, we discovered four attacks which break the security
of OAuth. The vulnerabilities can be exploited in practice and are present also in OpenID Connect.
We propose fixes for the identified vulnerabilities, and then, for the first time, actually prove the
security of OAuth in an expressive web model. In particular, we show that the fixed version of OAuth
(with security recommendations and best practices in place) provides the authorization, authentication,
and session integrity properties we specify.
Browsers do their best to enforce a hard security boundary on an origin-by-origin basis. To vastly
oversimplify, applications hosted at distinct origins must not be able to read each other’s data or
take action on each other’s behalf in the absence of explicit cooperation. Generally speaking,
browsers have done a reasonably good job at this; bugs crop up from time to time, but they’re
well-understood to be bugs by browser vendors and developers, and they’re addressed promptly.
The web platform, however, is designed to encourage both cross-origin communication and
inclusion. These design decisions weaken the borders that browsers place around origins, creating
opportunities for side-channel attacks (pixel perfect, resource timing, etc.) and server-side
confusion about the provenance of requests (CSRF, cross-site search). Spectre and related attacks
based on speculative execution make the problem worse by allowing attackers to read more
memory than they’re supposed to, which may contain sensitive cross-origin responses fetched by
documents in the same process. Spectre is a powerful attack technique, but it should be seen as a
(large) iterative improvement over the platform’s existing side-channels.
This document reviews the known classes of cross-origin information leakage, and uses this
categorization to evaluate some of the mitigations that have recently been proposed (CORB,
From-Origin, Sec-Metadata / Sec-Site, SameSite cookies and Cross-Origin-Isolate). We attempt to
survey their applicability to each class of attack, and to evaluate developers’ ability to deploy them
properly in real-world applications. Ideally, we’ll be able to settle on mitigation techniques which
are both widely deployable, and broadly scoped.
This paper presents our design and experience with Andromeda,
Google Cloud Platform’s network virtualization
stack. Our production deployment poses several challenging
requirements, including performance isolation among
customer virtual networks, scalability, rapid provisioning
of large numbers of virtual hosts, bandwidth and latency
largely indistinguishable from the underlying hardware,
and high feature velocity combined with high availability.
Andromeda is designed around a flexible hierarchy of
flow processing paths. Flows are mapped to a programming
path dynamically based on feature and performance
requirements. We introduce the Hoverboard programming
model, which uses gateways for the long tail of low bandwidth
flows, and enables the control plane to program
network connectivity for tens of thousands of VMs in
seconds. The on-host dataplane is based around a highperformance
OS bypass software packet processing path.
CPU-intensive per packet operations with higher latency
targets are executed on coprocessor threads. This architecture
allows Andromeda to decouple feature growth from
fast path performance, as many features can be implemented
solely on the coprocessor path. We demonstrate
that the Andromeda datapath achieves performance that is
competitive with hardware while maintaining the flexibility
and velocity of a software-based architecture.
Miscreants register thousands of new domains every day to launch
Internet-scale attacks, such as spam, phishing, and drive-by downloads.
Quickly and accurately determining a domain’s reputation
(association with malicious activity) provides a powerful tool for mitigating
threats and protecting users. Yet, existing domain reputation
systems work by observing domain use (e.g., lookup patterns, content
hosted)—often too late to prevent miscreants from reaping benefits of
the attacks that they launch.
As a complement to these systems, we explore the extent to which
features evident at domain registration indicate a domain’s subsequent
use for malicious activity. We develop PREDATOR, an approach that
uses only time-of-registration features to establish domain reputation.
We base its design on the intuition that miscreants need to obtain
many domains to ensure profitability and attack agility, leading to
abnormal registration behaviors (e.g., burst registrations, textually
similar names). We evaluate PREDATOR using registration logs of
second-level .com and .net domains over five months. PREDATOR
achieves a 70% detection rate with a false positive rate of 0.35%, thus
making it an effective—and early—first line of defense against the
misuse of DNS domains. It predicts malicious domains when they
are registered, which is typically days or weeks earlier than existing