In the late 1980s and early 1990s, object-oriented pro-
gramming revolutionized software development, popu-
larizing the approach of building of applications as col-
lections of modular components. Today we are seeing
a similar revolution in distributed system development,
with the increasing popularity of microservice archi-
tectures built from containerized software components.
Containers     are particularly well-suited
as the fundamental “object” in distributed systems by
virtue of the walls they erect at the container bound-
ary. As this architectural style matures, we are seeing the
emergence of design patterns, much as we did for object-
oriented programs, and for the same reason – thinking in
terms of objects (or containers) abstracts away the low-
level details of code, eventually revealing higher-level
patterns that are common to a variety of applications and
This paper describes three types of design patterns
that we have observed emerging in container-based dis-
tributed systems: single-container patterns for container
management, single-node patterns of closely cooperat-
ing containers, and multi-node patterns for distributed
algorithms. Like object-oriented patterns before them,
these patterns for distributed computation encode best
practices, simplify development, and make the systems
where they are used more reliable.
What’s remarkable about April 7th, 2014 isn’t what happened that day. It’s what didn’t.
That was the day the Heartbleed bug was revealed, and people around the globe scrambled to patch their systems against this zero-day issue, which came with already-proven exploits. In other public cloud platforms, customers were impacted by rolling restarts due to a requirement to reboot VMs. At Google, we quickly rolled out the fix to all our servers, including those that host Google Compute Engine. And none of you, our customers, noticed. Here’s why.
We introduced transparent maintenance for Google Compute Engine in December 2013, and since then we’ve kept customer VMs up and running as we rolled out software updates, fixed hardware problems, and recovered from some unexpected issues that have arisen. Through a combination of datacenter topology innovations and live migration technology, we now move our customers running VMs out of the way of planned hardware and software maintenance events, so we can keep the infrastructure protected and reliable—without your VMs, applications or workloads noticing that anything happened.
Notes on “Lessons Learned from Securing Google and Google Cloud” talk by Neils Provos
- Defense in Depth at scale by default
- Protect identities by default
- Protect data across full lifecycle by default
- Protect resources by default
- Trust through transparency
- Automate best practices and prevent common mistakes at scale
- Share innovation to raise the bar, support and invest in the security community.
- Address common cases programmatically
- Empower customers to fulfill their security responsibilities
- Trust and security can be the accelerant
Today, we’re putting our core web services behind the protections provided by U2F and Google’s account takeover and anomaly detection systems. Not only will this provide phishing resistance through the authentication proxy, but also authorization through IAM roles assigned to the user’s Google account.
- Google account
- U2F Yubikey enrolled and enforced for the users/groups that will be accessing the application.
- An hour or so.
- A global cloud that has been operating at billions of rps for decades. (Beyond the scope of this article.)
(notes from Next ’17)
Types of identities
|Google Account||Service Account||G Suite Domain||Google Group|
|Represents||Employee or User||Application Component||All members of the specified domain||All members of the group|
|Log in to Console?||Yes||No||No||No|
|Notes||An instance can run as a service account.|
I listened to a podcast and cut out the chit-chat, so you don’t have to:
Titan is a tiny security co-processing chip used for encryption, authentication of hardware, authentication of services.
Every piece of hardware in google’s infrastructure can be individually identified and cryptographically verified, and any service using it mutually authenticates to that hardware. This includes servers, networking cards, switches: everything. The Titan chip is one of the ways to accomplish that.
The chip certifies that hardware is in a trusted good state. If this verification fails, the hardware will not boot, and will be replaced.
Every time a new bios is pushed, Titan checks that the code is authentic Google code before allowing it to be installed. It then checks each time that code is booted that it is authentic, before allowing boot to continue.
‘similar in theory to the u2f security keys, everything should have identity, hardware and software. Everything’s identity is checked all the time.’
Suggestions that it plays important role in hardware level data encryption, key management systems, etc.
Each chip is fused with a unique identifier. Done sequentially, so can verify it’s part of inventory sequence.
Three main functions: RNG, crypto engine, and monotonic counter. First two are self-explanatory. Monotonic counter to protect against replay attacks, and make logs tamper evident.
Sits between ROM and RAM, to provide signature valididation of the first 8KB of BIOS on installation and boot up.
Produced entirely within google. Design and process to ensure provenance. Have used other vendor’s security coprocessors in the past, but want to ensure they understand/know the whole truth.
Google folks unaware of any other cloud that uses TPMs, etc to verify every piece of hardware and software running on it.
In the first post in this series, we talked about how our old event system worked and some of the lessons we learned from operating it. In the second post, we covered the design of our new event delivery system, and why we choose Cloud Pub/Sub as the transport mechanism for all events. In this third and final post, we will explain how we intend to consume all the published events with Dataflow, and what we have discovered about the performance of this approach so far.