If you’re familiar with the articles about Google’s BeyondCorp network
security model published in ;login: [1-3] over the past two years, you
may be thinking, “That all sounds good, but how does my organization
move from where we are today to a similar model? What do I need to do?
And what’s the potential impact on my company and my employees?” This
article discusses how we moved from our legacy network to the BeyondCorp model—changing the fundamentals of network access—without reducing the company’s productivity.
What’s remarkable about April 7th, 2014 isn’t what happened that day. It’s what didn’t.
That was the day the Heartbleed bug was revealed, and people around the globe scrambled to patch their systems against this zero-day issue, which came with already-proven exploits. In other public cloud platforms, customers were impacted by rolling restarts due to a requirement to reboot VMs. At Google, we quickly rolled out the fix to all our servers, including those that host Google Compute Engine. And none of you, our customers, noticed. Here’s why.
We introduced transparent maintenance for Google Compute Engine in December 2013, and since then we’ve kept customer VMs up and running as we rolled out software updates, fixed hardware problems, and recovered from some unexpected issues that have arisen. Through a combination of datacenter topology innovations and live migration technology, we now move our customers running VMs out of the way of planned hardware and software maintenance events, so we can keep the infrastructure protected and reliable—without your VMs, applications or workloads noticing that anything happened.
Traditional security models use a binary, all-or-nothing access model where access is granted solely on the basis of machine, user, and service membership into an authentication authority, such as active directory or LDAP.
Google is taking a different approach and using tiered access as one tool to address these challenges. In contrast to traditional models, tiered access provides more granular control. The level of access given to a single user or a single device may change over time based on device measurements allowing security to set access policy that considers deviations from intended device state.
At Google, the Technical Infrastructure organization manages access for the devices used by more than 61,000 employees while protecting against sophisticated adversaries. Below we outline the model that Google has adopted and continues to evolve as it’s rolled out. The first phase of roll-out has enabled access from mobile devices, while subsequent phases will expand enrollment to cover the entire fleet of Google devices.
Although anti-virus software has significantly evolved over
the last decade, classic signature matching based on byte
patterns is still a prevalent concept for identifying security
threats. Anti-virus signatures are a simple and fast detection
mechanism that can complement more sophisticated analysis
strategies. However, if signatures are not designed with care,
they can turn from a defensive mechanism into an instrument
of attack. In this paper, we present a novel method for
automatically deriving signatures from anti-virus software
and discuss how the extracted signatures can be used to
attack sensible data with the aid of the virus scanner itself.
To this end, we study the practicability of our approach
using four commercial products and exemplary demonstrate
anti-virus assisted attacks in three different scenarios.
Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. And Google has built one of the most secure and robust cloud infrastructures for processing this data to make our services better. Now for models trained from user interaction with mobile devices, we’re introducing an additional approach: Federated Learning.
Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the Mobile Vision API and On-Device Smart Reply) by bringing model training to the device as well.
Most current web browsers employ a monolithic architecture
that combines “the user” and “the web” into a single
protection domain. An attacker who exploits an arbitrary
code execution vulnerability in such a browser can steal sensitive
files or install malware. In this paper, we present the
security architecture of Chromium, the open-source browser
upon which Google Chrome is built. Chromium has two
modules in separate protection domains: a browser kernel,
which interacts with the operating system, and a rendering
engine, which runs with restricted privileges in a sandbox.
This architecture helps mitigate high-severity attacks without
sacrificing compatibility with existing web sites. We
define a threat model for browser exploits and evaluate how
the architecture would have mitigated past vulnerabilities.
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC–called a Tensor Processing Unit (TPU)–deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS), and a large (28MiB) software-managed on-chip memory. The TPU’s deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, …) that help average throughput more than guaranteed latency. The lack of of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’ NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X – 30X faster than its contemporary GPU or CPU with TOPS/Watt about 30X – 80X higher. Moreover, using the GPUs GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.