Log(Graph): A Near-Optimal High-Performance Graph Representation

Today’s graphs used in domains such as machine learning or
social network analysis may contain hundreds of billions of
edges. Yet, they are not necessarily stored efficiently, and standard
graph representations such as adjacency lists waste a
significant number of bits while graph compression schemes
such as WebGraph often require time-consuming decompression.
To address this, we propose Log(Graph): a graph representation
that combines high compression ratios with very
low-overhead decompression to enable cheaper and faster
graph processing. The key idea is to encode a graph so that
the parts of the representation approach or match the respective
storage lower bounds. We call our approach “graph
logarithmization” because these bounds are usually logarithmic.
Our high-performance Log(Graph) implementation
based on modern bitwise operations and state-of-the-art succinct
data structures achieves high compression ratios as well
as performance. For example, compared to the tuned Graph
Algorithm Processing Benchmark Suite (GAPBS), it reduces
graph sizes by 20-35% while matching GAPBS’ performance
or even delivering speedups due to reducing amounts of
transferred data. It approaches the compression ratio of the
established WebGraph compression library while enabling
speedups of up to more than 2×. Log(Graph) can improve
the design of various graph processing engines or libraries on
single NUMA nodes as well as distributed-memory systems

Source: https://people.csail.mit.edu/jshun/papers/loggraph.pdf

Gestalt: Fast, unified fault localization for networked systems

Abstract— We show that the performance of existing
fault localization algorithms differs markedly for different
networks; and no algorithm simultaneously provides
high localization accuracy and low computational overhead.
We develop a framework to explain these behaviors
by anatomizing the algorithms with respect to six
important characteristics of real networks, such as uncertain
dependencies, noise, and covering relationships. We
use this analysis to develop Gestalt, a new algorithm that
combines the best elements of existing ones and includes
a new technique to explore the space of fault hypotheses.
We run experiments on three real, diverse networks. For
each, Gestalt has either significantly higher localization
accuracy or an order of magnitude lower running time.
For example, when applied to the Lync messaging system
that is used widely within corporations, Gestalt localizes
faults with the same accuracy as Sherlock, while
reducing fault localization time from days to 23 seconds.

Source: https://www.usenix.org/system/files/conference/atc14/atc14-paper-mysore.pdf

Picasso: Lightweight Device Class Fingerprinting for Web Clients

In this work we present Picasso: a lightweight device class fingerprinting protocol that allows a server to verify the software and hardware stack of a mobile or desktop client. As an example, Picasso can distinguish between traffic sent by an authentic iPhone running Safari on iOS from an emulator or desktop client spoofing the same configuration. Our fingerprinting scheme builds on unpredictable yet stable noise introduced by a client’s browser, operating system, and graphical stack when rendering HTML5 canvases. Our algorithm is resistant to replay and includes a hardware-bound proof of work that forces a client to expend a configurable amount of CPU and memory to solve challenges. We demonstrate that Picasso can distinguish 52 million Android, iOS, Windows, and OSX clients running a diversity of browsers with 100% accuracy. We discuss applications of Picasso in abuse fighting, including protecting the Play Store or other mobile app marketplaces from inorganic interactions; or identifying login attempts to user accounts from previously unseen device classes.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45581.pdf

Post-quantum key exchange – a new hope∗

At IEEE Security & Privacy 2015, Bos, Costello, Naehrig, and Stebila proposed an instantiation of Peikert’s ring-learning-with-errors–based (Ring-LWE) key-exchange protocol (PQCrypto 2014), together with an implementation integrated into OpenSSL, with the affirmed goal of providing post-quantum security for TLS. In this work we revisit their instantiation and stand-alone implementation. Specifically, we propose new parameters and a better suited error distribution, analyze the scheme’s hardness against attacks by quantum computers in a conservative way, introduce a new and more efficient error-reconciliation mechanism, and propose a defense against backdoors and all-for-the-price-of-one attacks. By these measures and for the same lattice dimension, we more than double the security parameter, halve the communication overhead, and speed up computation by more than a factor of 8 in a portable C implementation and by more than a factor of 27 in an optimized implementation targeting current Intel CPUs. These speedups are achieved with comprehensive protection against timing attacks.

Source: https://eprint.iacr.org/2015/1092.pdf

Stealing Machine Learning Models via Prediction APIs

Machine learning (ML) models may be deemed confidential
due to their sensitive training data, commercial
value, or use in security applications. Increasingly often,
confidential ML models are being deployed with publicly
accessible query interfaces. ML-as-a-service (“predictive
analytics”) systems are an example: Some allow
users to train models on potentially sensitive data and
charge others for access on a pay-per-query basis.
The tension between model confidentiality and public
access motivates our investigation of model extraction
attacks. In such attacks, an adversary with black-box access,
but no prior knowledge of an ML model’s parameters
or training data, aims to duplicate the functionality
of (i.e., “steal”) the model. Unlike in classical learning
theory settings, ML-as-a-service offerings may accept
partial feature vectors as inputs and include confidence
values with predictions. Given these practices, we show
simple, efficient attacks that extract target ML models
with near-perfect fidelity for popular model classes including
logistic regression, neural networks, and decision
trees. We demonstrate these attacks against the online
services of BigML and Amazon Machine Learning.
We further show that the natural countermeasure of omitting
confidence values from model outputs still admits
potentially harmful model extraction attacks. Our results
highlight the need for careful ML model deployment and new model extraction countermeasures.

Source: https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_tramer.pdf