Amin Vahdat keynotes ONS again…
If you’re familiar with the articles about Google’s BeyondCorp network
security model published in ;login: [1-3] over the past two years, you
may be thinking, “That all sounds good, but how does my organization
move from where we are today to a similar model? What do I need to do?
And what’s the potential impact on my company and my employees?” This
article discusses how we moved from our legacy network to the BeyondCorp model—changing the fundamentals of network access—without reducing the company’s productivity.
What’s remarkable about April 7th, 2014 isn’t what happened that day. It’s what didn’t.
That was the day the Heartbleed bug was revealed, and people around the globe scrambled to patch their systems against this zero-day issue, which came with already-proven exploits. In other public cloud platforms, customers were impacted by rolling restarts due to a requirement to reboot VMs. At Google, we quickly rolled out the fix to all our servers, including those that host Google Compute Engine. And none of you, our customers, noticed. Here’s why.
We introduced transparent maintenance for Google Compute Engine in December 2013, and since then we’ve kept customer VMs up and running as we rolled out software updates, fixed hardware problems, and recovered from some unexpected issues that have arisen. Through a combination of datacenter topology innovations and live migration technology, we now move our customers running VMs out of the way of planned hardware and software maintenance events, so we can keep the infrastructure protected and reliable—without your VMs, applications or workloads noticing that anything happened.
Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. And Google has built one of the most secure and robust cloud infrastructures for processing this data to make our services better. Now for models trained from user interaction with mobile devices, we’re introducing an additional approach: Federated Learning.
Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the Mobile Vision API and On-Device Smart Reply) by bringing model training to the device as well.
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC–called a Tensor Processing Unit (TPU)–deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS), and a large (28MiB) software-managed on-chip memory. The TPU’s deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, …) that help average throughput more than guaranteed latency. The lack of of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters’ NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X – 30X faster than its contemporary GPU or CPU with TOPS/Watt about 30X – 80X higher. Moreover, using the GPUs GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
Today, we gave a keynote presentation at the Open Networking Summit, where we shared details about Espresso, Google’s peering edge architecture—the latest offering in our Software Defined Networking (SDN) strategy. Espresso has been in production for over two years and routes 20 percent of our total traffic to the internet—and growing. It’s changing the way traffic is directed at the peering edge, delivering unprecedented scale, flexibility and efficiency.
Notes on “Lessons Learned from Securing Google and Google Cloud” talk by Neils Provos
- Defense in Depth at scale by default
- Protect identities by default
- Protect data across full lifecycle by default
- Protect resources by default
- Trust through transparency
- Automate best practices and prevent common mistakes at scale
- Share innovation to raise the bar, support and invest in the security community.
- Address common cases programmatically
- Empower customers to fulfill their security responsibilities
- Trust and security can be the accelerant