If you’re familiar with the articles about Google’s BeyondCorp network
security model published in ;login: [1-3] over the past two years, you
may be thinking, “That all sounds good, but how does my organization
move from where we are today to a similar model? What do I need to do?
And what’s the potential impact on my company and my employees?” This
article discusses how we moved from our legacy network to the BeyondCorp model—changing the fundamentals of network access—without reducing the company’s productivity.
What’s remarkable about April 7th, 2014 isn’t what happened that day. It’s what didn’t.
That was the day the Heartbleed bug was revealed, and people around the globe scrambled to patch their systems against this zero-day issue, which came with already-proven exploits. In other public cloud platforms, customers were impacted by rolling restarts due to a requirement to reboot VMs. At Google, we quickly rolled out the fix to all our servers, including those that host Google Compute Engine. And none of you, our customers, noticed. Here’s why.
We introduced transparent maintenance for Google Compute Engine in December 2013, and since then we’ve kept customer VMs up and running as we rolled out software updates, fixed hardware problems, and recovered from some unexpected issues that have arisen. Through a combination of datacenter topology innovations and live migration technology, we now move our customers running VMs out of the way of planned hardware and software maintenance events, so we can keep the infrastructure protected and reliable—without your VMs, applications or workloads noticing that anything happened.
Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. And Google has built one of the most secure and robust cloud infrastructures for processing this data to make our services better. Now for models trained from user interaction with mobile devices, we’re introducing an additional approach: Federated Learning.
Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the Mobile Vision API and On-Device Smart Reply) by bringing model training to the device as well.
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC–called a Tensor Processing Unit (TPU)–deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS), and a large (28MiB) software-managed on-chip memory. The TPU’s deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, …) that help average throughput more than guaranteed latency. The lack of of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’ NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X – 30X faster than its contemporary GPU or CPU with TOPS/Watt about 30X – 80X higher. Moreover, using the GPUs GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
Today, we gave a keynote presentation at the Open Networking Summit, where we shared details about Espresso, Google’s peering edge architecture—the latest offering in our Software Defined Networking (SDN) strategy. Espresso has been in production for over two years and routes 20 percent of our total traffic to the internet—and growing. It’s changing the way traffic is directed at the peering edge, delivering unprecedented scale, flexibility and efficiency.
Notes on “Lessons Learned from Securing Google and Google Cloud” talk by Neils Provos
- Defense in Depth at scale by default
- Protect identities by default
- Protect data across full lifecycle by default
- Protect resources by default
- Trust through transparency
- Automate best practices and prevent common mistakes at scale
- Share innovation to raise the bar, support and invest in the security community.
- Address common cases programmatically
- Empower customers to fulfill their security responsibilities
- Trust and security can be the accelerant
I listened to a podcast and cut out the chit-chat, so you don’t have to:
Titan is a tiny security co-processing chip used for encryption, authentication of hardware, authentication of services.
Every piece of hardware in google’s infrastructure can be individually identified and cryptographically verified, and any service using it mutually authenticates to that hardware. This includes servers, networking cards, switches: everything. The Titan chip is one of the ways to accomplish that.
The chip certifies that hardware is in a trusted good state. If this verification fails, the hardware will not boot, and will be replaced.
Every time a new bios is pushed, Titan checks that the code is authentic Google code before allowing it to be installed. It then checks each time that code is booted that it is authentic, before allowing boot to continue.
‘similar in theory to the u2f security keys, everything should have identity, hardware and software. Everything’s identity is checked all the time.’
Suggestions that it plays important role in hardware level data encryption, key management systems, etc.
Each chip is fused with a unique identifier. Done sequentially, so can verify it’s part of inventory sequence.
Three main functions: RNG, crypto engine, and monotonic counter. First two are self-explanatory. Monotonic counter to protect against replay attacks, and make logs tamper evident.
Sits between ROM and RAM, to provide signature valididation of the first 8KB of BIOS on installation and boot up.
Produced entirely within google. Design and process to ensure provenance. Have used other vendor’s security coprocessors in the past, but want to ensure they understand/know the whole truth.
Google folks unaware of any other cloud that uses TPMs, etc to verify every piece of hardware and software running on it.