Fauna | A Comparison of Scalable Database Isolation Levels

It is very difficult to find accurate information about the correctness and isolation levels offered by modern distributed databases, and the operational conditions required to achieve them. Developers use different terms for the same thing, the meaning of terms varies or is ambiguous, and sometimes vendors themselves do not actually know.

At Fauna, we care a lot about accurately describing which guarantees different systems actually provide. This is our effort to centralize a description of which database does what, based on publicly available information (documentation, source code, third-party analyses, and developers’ comments). For consistency’s sake, we will use the terminology from Kyle Kingsbury’s explanation on the Jepsen site. The chart is ranked by the maximum multi-partition isolation level offered.

The data is based on statements about isolation levels from vendor documentation, white papers, and developer commentary, exclusive of aspirational marketing statements. We have tried to be neutral in the characterization of the various systems’ architectural properties. Whether the system implementations uphold these guarantees is addressed elsewhere. If you haven’t already, please see FaunaDB’s own Jepsen results for confirmation that FaunaDB upholds its guarantees.

Source: https://fauna.com/blog/a-comparison-of-scalable-database-isolation-levels

Designing Distributed Systems with TLA+ • Hillel Wayne

Concurrency is hard. How do you test your system when it’s spread across three services and four languages? Unit testing and type systems only take us so far. At some point we need new tools.

Enter TLA+. TLA+ is a specification language that describes your system and the properties you want. This makes it a fantastic complement to testing: not only can you check your code, you can check your design, too! TLA+ is especially effective for testing concurrency problems, like stalling, race conditions, and dropped messages.

This talk will introduce the ideas behind TLA+ and how it works, with a focus on practical examples. We’ll also show how it caught complex bugs in our systems, as well as how you can start applying it to your own work.

(This is the exact same abstract as the last TLA+ talk, despite the two having just one slide in common. Writing abstracts is not my strong suite.)

Source: https://www.hillelwayne.com/talks/distributed-systems-tlaplus/

Ubiq: A Scalable and Fault-tolerant Log Processing Infrastructure

Abstract. Most of today’s Internet applications generate vast amounts
of data (typically, in the form of event logs) that needs to be processed
and analyzed for detailed reporting, enhancing user experience and increasing
monetization. In this paper, we describe the architecture of
Ubiq, a geographically distributed framework for processing continuously
growing log files in real time with high scalability, high availability and
low latency. The Ubiq framework fully tolerates infrastructure degradation
and data center-level outages without any manual intervention. It
also guarantees exactly-once semantics for application pipelines to process
logs as a collection of multiple events. Ubiq has been in production
for Google’s advertising system for many years and has served as a critical
log processing framework for several dozen pipelines. Our production
deployment demonstrates linear scalability with machine resources, extremely
high availability even with underlying infrastructure failures, and
an end-to-end latency of under a minute.

Source: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45805.pdf

Design patterns for container-based distributed systems

In the late 1980s and early 1990s, object-oriented pro-
gramming revolutionized software development, popu-
larizing the approach of building of applications as col-
lections of modular components. Today we are seeing
a similar revolution in distributed system development,
with the increasing popularity of microservice archi-
tectures built from containerized software components.
Containers [15] [22] [1] [2] are particularly well-suited
as the fundamental “object” in distributed systems by
virtue of the walls they erect at the container bound-
ary. As this architectural style matures, we are seeing the
emergence of design patterns, much as we did for object-
oriented programs, and for the same reason – thinking in
terms of objects (or containers) abstracts away the low-
level details of code, eventually revealing higher-level
patterns that are common to a variety of applications and
algorithms.
This paper describes three types of design patterns
that we have observed emerging in container-based dis-
tributed systems: single-container patterns for container
management, single-node patterns of closely cooperat-
ing containers, and multi-node patterns for distributed
algorithms. Like object-oriented patterns before them,
these patterns for distributed computation encode best
practices, simplify development, and make the systems
where they are used more reliable.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45406.pdf

Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions

We analyze how modern distributed storage systems behave
in the presence of file-system faults such as data
corruption and read and write errors. We characterize
eight popular distributed storage systems and uncover
numerous bugs related to file-system fault tolerance. We
find that modern distributed systems do not consistently
use redundancy to recover from file-system faults: a
single file-system fault can cause catastrophic outcomes
such as data loss, corruption, and unavailability. Our results
have implications for the design of next generation
fault-tolerant distributed and cloud storage systems.

Source: https://www.usenix.org/system/files/conference/fast17/fast17-ganesan.pdf

Flexible Network Bandwidth and Latency Provisioning in the Datacenter

Abstract
Predictably sharing the network is critical to achieving
high utilization in the datacenter. Past work has focussed
on providing bandwidth to endpoints, but often
we want to allocate resources among multi-node services.
In this paper, we present Parley, which provides
service-centric minimum bandwidth guarantees, which
can be composed hierarchically. Parley also supports
service-centric weighted sharing of bandwidth in excess
of these guarantees. Further, we show how to configure
these policies so services can get low latencies even at
high network load. We evaluate Parley on a multi-tiered
oversubscribed network connecting 90 machines, each
with a 10Gb/s network interface, and demonstrate that
Parley is able to meet its goals.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43871.pdf

Trumpet: Timely and Precise Triggers in Data Centers

As data centers grow larger and strive to provide tight performance
and availability SLAs, their monitoring infrastructure
must move from passive systems that provide aggregated
inputs to human operators, to active systems that enable programmed
control. In this paper, we propose Trumpet, an
event monitoring system that leverages CPU resources and
end-host programmability, to monitor every packet and report
events at millisecond timescales. Trumpet users can express
many network-wide events, and the system efficiently detects
these events using triggers at end-hosts. Using careful design,
Trumpet can evaluate triggers by inspecting every packet at
full line rate even on future generations of NICs, scale to
thousands of triggers per end-host while bounding packet
processing delay to a few microseconds, and report events
to a controller within 10 milliseconds, even in the presence
of attacks. We demonstrate these properties using an implementation
of Trumpet, and also show that it allows operators
to describe new network events such as detecting correlated
bursts and loss, identifying the root cause of transient congestion,
and detecting short-term anomalies at the scale of a data
center tenant.

Source: http://www.cs.yale.edu/homes/yu-minlan/writeup/sigcomm16.pdf

Incremental, Iterative Data Processing with Timely Dataflow

We describe the timely dataflow model for distributed computation and its implementation in the Naiad system. The model supports stateful iterative and incremental computations. It enables both low-latency stream processing and high-throughput batch processing, using a new approach to coordination that combines asynchronous and fine-grained synchronous execution. We describe two of the programming frameworks built on Naiad: GraphLINQ for parallel graph processing, and differential dataflow for nested iterative and incremental computations. We show that a general-purpose system can achieve performance that matches, and sometimes exceeds, that of specialized systems.

Source: http://delivery.acm.org/10.1145/2990000/2983551/p75-murray.pdf?ip=73.135.106.24&id=2983551&acc=OA&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E5945DC2EABF3343C&CFID=887680414&CFTOKEN=71994990&__acm__=1484169807_1cb0e26f9cab8e683902618d19f62d56