Unbounded, unordered, global-scale datasets are increasingly common in day-to-day business (e.g. Web logs, mobile usage statistics, and sensor networks). At the same time, consumers of these datasets have evolved sophisticated requirements, such as event-time ordering and windowing by features of the data themselves. On top of that — consumers want answers *now*. This talk will cover how Google has evolved its earlier work on batch and streaming systems (including MapReduce, FlumeJava, and Millwheel) into Dataflow, a new programming model that allows users to clearly trade off correctness, latency, and cost. An overview of this model will be provided, including a demo of the fully managed service it enables, and a discussion on some of the many use cases that got Google here.
Great talk given at @Scale 2015