How we use Colossus to improve storage efficiency
Determining whether online users are authorized to access
digital objects is central to preserving privacy. This paper presents the design, implementation, and deployment
of Zanzibar, a global system for storing and evaluating access control lists. Zanzibar provides a uniform data model
and configuration language for expressing a wide range of
access control policies from hundreds of client services at
Google, including Calendar, Cloud, Drive, Maps, Photos,
and YouTube. Its authorization decisions respect causal ordering of user actions and thus provide external consistency
amid changes to access control lists and object contents.
Zanzibar scales to trillions of access control lists and millions
of authorization requests per second to support services used
by billions of people. It has maintained 95th-percentile latency of less than 10 milliseconds and availability of greater
than 99.999% over 3 years of production use
References and links
It is very difficult to find accurate information about the correctness and isolation levels offered by modern distributed databases, and the operational conditions required to achieve them. Developers use different terms for the same thing, the meaning of terms varies or is ambiguous, and sometimes vendors themselves do not actually know.
At Fauna, we care a lot about accurately describing which guarantees different systems actually provide. This is our effort to centralize a description of which database does what, based on publicly available information (documentation, source code, third-party analyses, and developers’ comments). For consistency’s sake, we will use the terminology from Kyle Kingsbury’s explanation on the Jepsen site. The chart is ranked by the maximum multi-partition isolation level offered.
The data is based on statements about isolation levels from vendor documentation, white papers, and developer commentary, exclusive of aspirational marketing statements. We have tried to be neutral in the characterization of the various systems’ architectural properties. Whether the system implementations uphold these guarantees is addressed elsewhere. If you haven’t already, please see FaunaDB’s own Jepsen results for confirmation that FaunaDB upholds its guarantees.
Given the recent explosion of interest in streaming data and online algorithms, clustering of time series
subsequences, extracted via a sliding window, has received much attention. In this work we make a
surprising claim. Clustering of time series subsequences is meaningless. More concretely, clusters extracted
from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by
any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random.
While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has
never appeared in the literature. We can justify calling our claim surprising, since it invalidates the
contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative
examples, and a comprehensive set of experiments on reimplementations of previous work. Although the
primary contribution of our work is to draw attention to the fact that an apparent solution to an important
problem is incorrect and should no longer be used, we also introduce a novel method which, based on the
concept of time series motifs, is able to meaningfully cluster subsequences on some time series datasets