Sharding is a fundamental building block of large-scale
applications, but most have their own custom, ad-hoc
implementations. Our goal is to make sharding as easily
reusable as a filesystem or lock manager. Slicer is
Google’s general purpose sharding service. It monitors
signals such as load hotspots and server health to dynamically
shard work over a set of servers. Its goals are
to maintain high availability and reduce load imbalance
while minimizing churn from moved work.
In this paper, we describe Slicer’s design and implementation.
Slicer has the consistency and global optimization
of a centralized sharder while approaching the
high availability, scalability, and low latency of systems
that make local decisions. It achieves this by separating
concerns: a reliable data plane forwards requests, and a
smart control plane makes load-balancing decisions off
the critical path. Slicer’s small but powerful API has
proven useful and easy to adopt in dozens of Google applications.
It is used to allocate resources for web service
front-ends, coalesce writes to increase storage bandwidth,
and increase the efficiency of a web cache. It
currently handles 2-7M req/s of production traffic. The
median production Slicer-managed workload uses 63%
fewer resources than it would with static sharding.