MillWheel: Fault-Tolerant Stream Processing at Internet Scale

MillWheel is a framework for building low-latency data-processing
applications that is widely used at Google. Users specify a directed
computation graph and application code for individual nodes, and
the system manages persistent state and the continuous flow of
records, all within the envelope of the framework’s fault-tolerance
This paper describes MillWheel’s programming model as well as
its implementation. The case study of a continuous anomaly detector
in use at Google serves to motivate how many of MillWheel’s
features are used. MillWheel’s programming model provides a notion
of logical time, making it simple to write time-based aggregations.
MillWheel was designed from the outset with fault tolerance
and scalability in mind. In practice, we find that MillWheel’s
unique combination of scalability, fault tolerance, and a versatile
programming model lends itself to a wide variety of problems at



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s