The rfmt code formatter incorporates a new algorithm that optimizes code layout with respect to an intuitive notion of layout cost. This note describes the foundations of the algorithm, and the programming abstractions used to facilitate its use with a variety of languages and code layout policies.
Debugging high throughput, low-latency C/C++ systems in production is hard. At Google we developed XRay, a function call tracing system that allows Google engineers to get accurate function call traces with negligible overhead when off and moderate overhead when on, suitable for services deployed in production. XRay enables efficient function call entry/exit logging with high accuracy timestamps, and can be dynamically enabled and disabled. This white paper describes the XRay tracing system and its implementation. It also describes future plans with open sourcing XRay and engaging open source communities.
AutoFDO is a system to simplify real-world deployment of
feedback-directed optimization (FDO). The system works by
sampling hardware performance monitors on production machines
and using those profiles to guide optimization. Profile
data is stale by design, and we have implemented compiler
features to deliver stable speedup across releases. The resulting
performance has a geometric mean improvement of 10.5
The system is deployed to hundreds of binaries at Google,
and it is extremely easy to enable; users need only to add
some flags to their release build. To date, AutoFDO has
increased the number of FDO users at Google by 8X and
has doubled the number of cycles spent in FDO-optimized
binaries. Over half of CPU cycles used are now spent in some
flavor of FDO-optimized binaries.
TensorFlow is a machine learning system that operates at
large scale and in heterogeneous environments. TensorFlow
uses dataflow graphs to represent computation,
shared state, and the operations that mutate that state. It
maps the nodes of a dataflow graph across many machines
in a cluster, and within a machine across multiple computational
devices, including multicore CPUs, generalpurpose
GPUs, and custom designed ASICs known as
Tensor Processing Units (TPUs). This architecture gives
flexibility to the application developer: whereas in previous
“parameter server” designs the management of shared
state is built into the system, TensorFlow enables developers
to experiment with novel optimizations and training
algorithms. TensorFlow supports a variety of applications,
with particularly strong support for training and
inference on deep neural networks. Several Google services
use TensorFlow in production, we have released it
as an open-source project, and it has become widely used
for machine learning research. In this paper, we describe
the TensorFlow dataflow model in contrast to existing systems,
and demonstrate the compelling performance that
TensorFlow achieves for several real-world applications.
TensorFlow  is an interface for expressing machine learning
algorithms, and an implementation for executing such algorithms.
A computation expressed using TensorFlow can be
executed with little or no change on a wide variety of heterogeneous
systems, ranging from mobile devices such as phones
and tablets up to large-scale distributed systems of hundreds
of machines and thousands of computational devices such as
GPU cards. The system is flexible and can be used to express
a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been
used for conducting research and for deploying machine learning
systems into production across more than a dozen areas of
computer science and other fields, including speech recognition,
computer vision, robotics, information retrieval, natural
language processing, geographic information extraction, and
computational drug discovery. This paper describes the TensorFlow
interface and an implementation of that interface that
we have built at Google. The TensorFlow API and a reference
implementation were released as an open-source package under
the Apache 2.0 license in November, 2015 and are available at
The story of gtags
Flayer is a tool for dynamically exposing application
innards for security testing and analysis. It is implemented
on the dynamic binary instrumentation framework
Valgrind  and its memory error detection plugin,
Memcheck . This paper focuses on the implementation
of Flayer, its supporting libraries, and their application
to software security.
Flayer provides tainted, or marked, data flow analysis
and instrumentation mechanisms for arbitrarily altering
that flow. Flayer improves upon prior taint tracing
tools with bit-precision. Taint propagation calculations
are performed for each value-creating memory or register
operation. These calculations are embedded in the
target application’s running code using dynamic instrumentation.
The same technique has been employed to allow
the user to control the outcome of conditional jumps
and step over function calls.
Flayer’s functionality provides a robust foundation for
the implementation of security tools and techniques. In
particular, this paper presents an effective fault injection
testing technique and an automation library, LibFlayer.
Alongside these contributions, it explores techniques for
vulnerability patch analysis and guided source code auditing.
Flayer finds errors in real software. In the past year, its
use has yielded the expedient discovery of flaws in security
critical software including OpenSSH and OpenSSL.