References and links
It is very difficult to find accurate information about the correctness and isolation levels offered by modern distributed databases, and the operational conditions required to achieve them. Developers use different terms for the same thing, the meaning of terms varies or is ambiguous, and sometimes vendors themselves do not actually know.
At Fauna, we care a lot about accurately describing which guarantees different systems actually provide. This is our effort to centralize a description of which database does what, based on publicly available information (documentation, source code, third-party analyses, and developers’ comments). For consistency’s sake, we will use the terminology from Kyle Kingsbury’s explanation on the Jepsen site. The chart is ranked by the maximum multi-partition isolation level offered.
The data is based on statements about isolation levels from vendor documentation, white papers, and developer commentary, exclusive of aspirational marketing statements. We have tried to be neutral in the characterization of the various systems’ architectural properties. Whether the system implementations uphold these guarantees is addressed elsewhere. If you haven’t already, please see FaunaDB’s own Jepsen results for confirmation that FaunaDB upholds its guarantees.
Given the recent explosion of interest in streaming data and online algorithms, clustering of time series
subsequences, extracted via a sliding window, has received much attention. In this work we make a
surprising claim. Clustering of time series subsequences is meaningless. More concretely, clusters extracted
from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by
any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random.
While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has
never appeared in the literature. We can justify calling our claim surprising, since it invalidates the
contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative
examples, and a comprehensive set of experiments on reimplementations of previous work. Although the
primary contribution of our work is to draw attention to the fact that an apparent solution to an important
problem is incorrect and should no longer be used, we also introduce a novel method which, based on the
concept of time series motifs, is able to meaningfully cluster subsequences on some time series datasets
Abstract—Fallback authentication, the process of recovering
access to an account if the primary authenticator is forgotten
or lost, is of significant importance in real-world applications.
A variety of mechanisms are deployed, ranging from secondary
channels (such as email and SMS), over personal knowledge
questions (such as the “mother’s maiden name”) to social authentication (such as vouching-based approaches). One central
difference with primary authentication is that the elapsed time
between enrollment and authentication can be much longer,
typically in the range of years. However, few of the mechanisms
used today have been studied over such long time-spans, making
claims about their usability difficult to generalize to real-world
applications. Additionally, most past studies have considered one
or two mechanisms only, and deriving a meaningful comparison
of a relevant number of mechanisms from the individual datapoints is not easy. In this work in progress paper, we report on the
design of a usability study that we will use to study the usability
of authentication mechanisms over a more realistic time-frame of
up to 18 months, and will provide a fair comparison of the four
most widely used fallback authentication schemes. We present
results of a pre-study with 74 participants that ran over 4 weeks
and indicates that schemes based on email and SMS are more
usable. Mechanisms based on designated trustees and personal
knowledge questions, on the other hand, fall short, both in terms
of convenience and efficiency.
This paper describes a network storage system, called
Venti, intended for archival data. In this system, a
unique hash of a block’s contents acts as the block
identifier for read and write operations. This approach
enforces a write-once policy, preventing accidental or
malicious destruction of data. In addition, duplicate
copies of a block can be coalesced, reducing the
consumption of storage and simplifying the
implementation of clients. Venti is a building block for
constructing a variety of storage applications such as
logical backup, physical backup, and snapshot file
We have built a prototype of the system and present
some preliminary performance results. The system uses
magnetic disks as the storage technology, resulting in
an access time for archival data that is comparable to
non-archival data. The feasibility of the write-once
model for storage is demonstrated using data from over
a decade’s use of two Plan 9 file systems.
This paper is a revision of Thompsons The Plan 9 File Server, and
describes the structure and the operation of the new 64-bit Plan 9 file
servers. Some specifics apply to the 32-bit Plan 9 file server Emelie,
which code is also the basis for the user-level file server kfs.
In 2004, Collyer created a 64-bit version of Thompsons 32-bit file
server, updating all file offsets, sizes and block numbers to 64 bits. In
addition, triple- and quadruple-indirect blocks were implemented. File
name components were extended from 27 to 55 bytes. This code is also
the basis for the user-level file server cwfs(4).