Today’s graphs used in domains such as machine learning or
social network analysis may contain hundreds of billions of
edges. Yet, they are not necessarily stored efficiently, and standard
graph representations such as adjacency lists waste a
significant number of bits while graph compression schemes
such as WebGraph often require time-consuming decompression.
To address this, we propose Log(Graph): a graph representation
that combines high compression ratios with very
low-overhead decompression to enable cheaper and faster
graph processing. The key idea is to encode a graph so that
the parts of the representation approach or match the respective
storage lower bounds. We call our approach “graph
logarithmization” because these bounds are usually logarithmic.
Our high-performance Log(Graph) implementation
based on modern bitwise operations and state-of-the-art succinct
data structures achieves high compression ratios as well
as performance. For example, compared to the tuned Graph
Algorithm Processing Benchmark Suite (GAPBS), it reduces
graph sizes by 20-35% while matching GAPBS’ performance
or even delivering speedups due to reducing amounts of
transferred data. It approaches the compression ratio of the
established WebGraph compression library while enabling
speedups of up to more than 2×. Log(Graph) can improve
the design of various graph processing engines or libraries on
single NUMA nodes as well as distributed-memory systems
Any security capability is inherently only as secure as the other systems
it trusts. The BeyondCorp project helped Google clearly define
and make access decisions around the platforms we trust, shifting
our security strategy from protecting services to protecting trusted platforms.
Previous BeyondCorp articles discussed the tooling Google uses to
confidently ascertain the provenance of a device, but we have not yet covered
the mechanics behind how we trust these devices.
Secure authentication and authorization within Facebook’s infrastructure play important
roles in protecting people using Facebook’s services. Enforcing security while maintaining a
flexible and performant infrastructure can be challenging at Facebook’s scale, especially in the
presence of varying layers of trust among our servers. Providing authentication and encryption
on a per-connection basis is certainly necessary, but also insufficient for securing more complex
flows involving multiple services or intermediaries at lower levels of trust.
To handle these more complicated scenarios, we have developed two token-based mechanisms
for authentication. The first type is based on certificates and allows for flexible verification due
to its public-key nature. The second type, known as “crypto auth tokens”, is symmetric-key
based, and hence more restrictive, but also much more scalable to a high volume of requests.
Crypto auth tokens rely on pseudorandom functions to generate independently-distributed keys
for distinct identities.
Finally, we provide (mock) examples which illustrate how both of our token primitives can be
used to authenticate real-world flows within our infrastructure, and how a token-based approach
to authentication can be used to handle security more broadly in other infrastructures which
have strict performance requirements and where relying on TLS alone is not enough.
We present Sincronia, a near-optimal network design for coflows that can be implemented on top on any transport layer (for flows) that supports priority scheduling. Sincronia achieves this using a key technical result — we show that given a “right” ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time within 4X of the optimal as long as (co)flows are prioritized with respect to the ordering.
Sincronia uses a simple greedy mechanism to periodically order all unfinished coflows; each host sets priorities for its flows using corresponding coflow order and offloads the flow scheduling and rate allocation to the underlying priority-enabled transport layer. We evaluate Sincronia over a real testbed comprising 16-servers and commodity switches, and using simulations across a variety of workloads. Evaluation results suggest that Sincronia not only admits a practical, near-optimal design but also improves upon state-of-the-art network designs for coflows (sometimes by as much as 8X).
Private WANs are increasingly important to the operation of
enterprises, telecoms, and cloud providers. For example, B4,
Google’s private software-defined WAN, is larger and growing
faster than our connectivity to the public Internet. In this
paper, we present the five-year evolution of B4. We describe
the techniques we employed to incrementally move from
offering best-effort content-copy services to carrier-grade
availability, while concurrently scaling B4 to accommodate
100x more traffic. Our key challenge is balancing the tension
introduced by hierarchy required for scalability, the partitioning
required for availability, and the capacity asymmetry
inherent to the construction and operation of any large-scale
network. We discuss our approach to managing this tension:
i) we design a custom hierarchical network topology for both
horizontal and vertical software scaling, ii) we manage inherent
capacity asymmetry in hierarchical topologies using
a novel traffic engineering algorithm without packet encapsulation,
and iii) we re-architect switch forwarding rules
via two-stage matching/hashing to deal with asymmetric
network failures at scale.