SREcon17 Asia/Australia: SRE Your gRPC—Building Reliable Distributed Systems Illustrated with gRPC

SREcon17 Asia/Australia: SRE Your gRPC—Building Reliable Distributed Systems Illustrated with gRPC

Grainne Sheerin and Gabe Krabbe, Google

Distributed systems have sharp edges, and we have a wealth of experience cutting ourselves on them. We want to share our experience with SREs elsewhere, so they can skip making the same mistakes and join us making exciting new ones instead!

We will share practical suggestions from 14 years of failing gracefully:

– In a distributed service, every component is a frontend to another one down the stack. How can it deal with backend failures so that the service as a whole does not go down?
– In a distributed service, every component is a backend for another one up the stack. How can it be scaled and managed, avoiding overload and under-use?
– In a distributed service, latency is often the biggest uncertainty. How can it be kept predictable?
– In a distributed service, availability, processing, and latency costs contributions are hard to assign. When things (inevitably) go wrong, what components are to blame? When they work, where are the biggest opportunities for improvement?

We will cover best and worst practices, using specific gRPC examples for illustration.

Sign up to find out more about SREcon at https://srecon.usenix.org

via YouTube https://youtu.be/eoy9z0UlaII

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s