Dataflow: A Unified Model for Batch and Streaming Data Processing – YouTube

Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business (e.g. Web logs, mobile usage statistics, and sensor networks). At the same time, consumers of these datasets have evolved sophisticated requirements, such as event-time ordering and windowing by features of the data themselves. On top of that — consumers want answers *now*. This talk will cover how Google has evolved its earlier work on batch and streaming systems (including MapReduce, FlumeJava, and Millwheel) into Dataflow, a new programming model that allows users to clearly trade off correctness, latency, and cost. An overview of this model will be provided, including a demo of the fully managed service it enables, and a discussion on some of the many use cases that got Google here.


Great talk given at @Scale 2015


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s