In a distributed system networks are of paramount importance. This
paper describes the implementation, design philosophy, and organization
of network support in Plan 9. Topics include network requirements for
distributed systems, our kernel implementation, network naming, user
interfaces, and performance. We also observe that much of this organization is relevant to current systems.
We present Sincronia, a near-optimal network design for coflows that can be implemented on top on any transport layer (for flows) that supports priority scheduling. Sincronia achieves this using a key technical result — we show that given a “right” ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time within 4X of the optimal as long as (co)flows are prioritized with respect to the ordering.
Sincronia uses a simple greedy mechanism to periodically order all unfinished coflows; each host sets priorities for its flows using corresponding coflow order and offloads the flow scheduling and rate allocation to the underlying priority-enabled transport layer. We evaluate Sincronia over a real testbed comprising 16-servers and commodity switches, and using simulations across a variety of workloads. Evaluation results suggest that Sincronia not only admits a practical, near-optimal design but also improves upon state-of-the-art network designs for coflows (sometimes by as much as 8X).
Private WANs are increasingly important to the operation of
enterprises, telecoms, and cloud providers. For example, B4,
Google’s private software-defined WAN, is larger and growing
faster than our connectivity to the public Internet. In this
paper, we present the five-year evolution of B4. We describe
the techniques we employed to incrementally move from
offering best-effort content-copy services to carrier-grade
availability, while concurrently scaling B4 to accommodate
100x more traffic. Our key challenge is balancing the tension
introduced by hierarchy required for scalability, the partitioning
required for availability, and the capacity asymmetry
inherent to the construction and operation of any large-scale
network. We discuss our approach to managing this tension:
i) we design a custom hierarchical network topology for both
horizontal and vertical software scaling, ii) we manage inherent
capacity asymmetry in hierarchical topologies using
a novel traffic engineering algorithm without packet encapsulation,
and iii) we re-architect switch forwarding rules
via two-stage matching/hashing to deal with asymmetric
network failures at scale.
This paper presents our design and experience with Andromeda,
Google Cloud Platform’s network virtualization
stack. Our production deployment poses several challenging
requirements, including performance isolation among
customer virtual networks, scalability, rapid provisioning
of large numbers of virtual hosts, bandwidth and latency
largely indistinguishable from the underlying hardware,
and high feature velocity combined with high availability.
Andromeda is designed around a flexible hierarchy of
flow processing paths. Flows are mapped to a programming
path dynamically based on feature and performance
requirements. We introduce the Hoverboard programming
model, which uses gateways for the long tail of low bandwidth
flows, and enables the control plane to program
network connectivity for tens of thousands of VMs in
seconds. The on-host dataplane is based around a highperformance
OS bypass software packet processing path.
CPU-intensive per packet operations with higher latency
targets are executed on coprocessor threads. This architecture
allows Andromeda to decouple feature growth from
fast path performance, as many features can be implemented
solely on the coprocessor path. We demonstrate
that the Andromeda datapath achieves performance that is
competitive with hardware while maintaining the flexibility
and velocity of a software-based architecture.
We present the design of Espresso, Google’s SDN-based Internet
peering edge routing infrastructure. This architecture grew out of a
need to exponentially scale the Internet edge cost-effectively and to
enable application-aware routing at Internet-peering scale. Espresso
utilizes commodity switches and host-based routing/packet process-
ing to implement a novel fine-grained traffic engineering capability.
Overall, Espresso provides Google a scalable peering edge that is
programmable, reliable, and integrated with global traffic systems.
Espresso also greatly accelerated deployment of new networking
features at our peering edge. Espresso has been in production for
two years and serves over 22% of Google’s total traffic to the Inter-
Amin Vahdat keynotes ONS again…
Today, we gave a keynote presentation at the Open Networking Summit, where we shared details about Espresso, Google’s peering edge architecture—the latest offering in our Software Defined Networking (SDN) strategy. Espresso has been in production for over two years and routes 20 percent of our total traffic to the internet—and growing. It’s changing the way traffic is directed at the peering edge, delivering unprecedented scale, flexibility and efficiency.