Log(Graph): A Near-Optimal High-Performance Graph Representation

Today’s graphs used in domains such as machine learning or
social network analysis may contain hundreds of billions of
edges. Yet, they are not necessarily stored efficiently, and standard
graph representations such as adjacency lists waste a
significant number of bits while graph compression schemes
such as WebGraph often require time-consuming decompression.
To address this, we propose Log(Graph): a graph representation
that combines high compression ratios with very
low-overhead decompression to enable cheaper and faster
graph processing. The key idea is to encode a graph so that
the parts of the representation approach or match the respective
storage lower bounds. We call our approach “graph
logarithmization” because these bounds are usually logarithmic.
Our high-performance Log(Graph) implementation
based on modern bitwise operations and state-of-the-art succinct
data structures achieves high compression ratios as well
as performance. For example, compared to the tuned Graph
Algorithm Processing Benchmark Suite (GAPBS), it reduces
graph sizes by 20-35% while matching GAPBS’ performance
or even delivering speedups due to reducing amounts of
transferred data. It approaches the compression ratio of the
established WebGraph compression library while enabling
speedups of up to more than 2×. Log(Graph) can improve
the design of various graph processing engines or libraries on
single NUMA nodes as well as distributed-memory systems

Source: https://people.csail.mit.edu/jshun/papers/loggraph.pdf

Advertisements

Pregel: A System for Large-Scale Graph Processing

Many practical computing problems concern large graphs.
Standard examples include the Web graph and various social
networks. The scale of these graphs—in some cases billions
of vertices, trillions of edges—poses challenges to their
efficient processing. In this paper we present a computational
model suitable for this task. Programs are expressed
as a sequence of iterations, in each of which a vertex can
receive messages sent in the previous iteration, send messages
to other vertices, and modify its own state and that of
its outgoing edges or mutate graph topology. This vertexcentric
approach is flexible enough to express a broad set of
algorithms. The model has been designed for efficient, scalable
and fault-tolerant implementation on clusters of thousands
of commodity computers, and its implied synchronicity
makes reasoning about programs easier. Distribution related
details are hidden behind an abstract API. The result
is a framework for processing large graphs that is expressive
and easy to program.

Source: https://kowshik.github.io/JPregel/pregel_paper.pdf

Graph Cube: On Warehousing and OLAP Multidimensional Networks

We consider extending decision support facilities toward large
sophisticated networks, upon which multidimensional attributes
are associated with network entities, thereby forming
the so-called multidimensional networks. Data warehouses
and OLAP (Online Analytical Processing) technology
have proven to be effective tools for decision support on
relational data. However, they are not well-equipped to handle
the new yet important multidimensional networks. In
this paper, we introduce Graph Cube, a new data warehousing
model that supports OLAP queries effectively on large
multidimensional networks. By taking account of both attribute
aggregation and structure summarization of the networks,
Graph Cube goes beyond the traditional data cube
model involved solely with numeric value based group-by’s,
thus resulting in a more insightful and structure-enriched
aggregate network within every possible multidimensional
space. Besides traditional cuboid queries, a new class of
OLAP queries, crossboid, is introduced that is uniquely useful
in multidimensional networks and has not been studied
before. We implement Graph Cube by combining special
characteristics of multidimensional networks with the existing
well-studied data cube techniques. We perform extensive
experimental studies on a series of real world data sets and
Graph Cube is shown to be a powerful and efficient tool for
decision support on large multidimensional networks.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37657.pdf