Today, we gave a keynote presentation at the Open Networking Summit, where we shared details about Espresso, Google’s peering edge architecture—the latest offering in our Software Defined Networking (SDN) strategy. Espresso has been in production for over two years and routes 20 percent of our total traffic to the internet—and growing. It’s changing the way traffic is directed at the peering edge, delivering unprecedented scale, flexibility and efficiency.
Today’s network control and management traffic are limited by
their reliance on existing data networks. Fate sharing in this context
is highly undesirable, since control traffic has very different availability
and traffic delivery requirements. In this paper, we explore
the feasibility of building a dedicated wireless facilities network for
data centers. We propose Angora, a low-latency facilities network
using low-cost, 60GHz beamforming radios that provides robust
paths decoupled from the wired network, and flexibility to adapt to
workloads and network dynamics. We describe our solutions to address
challenges in link coordination, link interference and network
failures. Our testbed measurements and simulation results show
that Angora enables large number of low-latency control paths to
run concurrently, while providing low latency end-to-end message
delivery with high tolerance for radio and rack failures.
Google’s B4 wide area network was first revealed several years ago. The outside observer might have thought, “Google’s B4 is finished. I wonder what they’re going to do next.” Turns out, once any network is in production @scale, there’s a continued need to make it better. Subhasree Mandal covered the reality of how Google iterated multiple times on different parts of B4 to improve its performance, availability, and scalability. Several of the challenges and solutions that Subhasree detailed were definitely at the intersection of networking and distributed systems. B4 was covered in a SIGCOMM 2013 paper from Google.
Computer networks lack a general control paradigm,
as traditional networks do not provide any networkwide
management abstractions. As a result, each new
function (such as routing) must provide its own state
distribution, element discovery, and failure recovery
mechanisms. We believe this lack of a common control
platform has significantly hindered the development of
flexible, reliable and feature-rich network control planes.
To address this, we present Onix, a platform on top of
which a network control plane can be implemented as a
distributed system. Control planes written within Onix
operate on a global view of the network, and use basic
state distribution primitives provided by the platform.
Thus Onix provides a general API for control plane
implementations, while allowing them to make their own
trade-offs among consistency, durability, and scalability.
Since the publication of OpenFlow: Enabling Innovation in Campus Networks in 2008, there has been a lot of published work and experience with SDN and OpenFlow in large networks and in datacenters, including at Google. In this article we will discuss an open source SDN controller, FAUCET. FAUCET was created to bring the benefits of SDN to a typical enterprise network and has been deployed in various settings, including the Open Networking Foundation, which runs an instance of FAUCET as their office network. FAUCET delivers high forwarding performance using switch hardware, while enabling operators to add features to their networks and deploy them quickly, in many cases without needing to change (or even reboot) hardware – and interoperates with neighboring non-SDN network devices.
The design space for large, multipath datacenter networks is
large and complex, and no one design fits all purposes. Network
architects must trade off many criteria to design costeffective,
reliable, and maintainable networks, and typically
cannot explore much of the design space. We present Condor,
our approach to enabling a rapid, efficient design cycle.
Condor allows architects to express their requirements as constraints
via a Topology Description Language (TDL), rather
than having to directly specify network structures. Condor
then uses constraint-based synthesis to rapidly generate candidate
topologies, which can be analyzed against multiple
criteria. We show that TDL supports concise descriptions
of topologies such as fat-trees, BCube, and DCell; that we
can generate known and novel variants of fat-trees with simple
changes to a TDL file; and that we can synthesize large
topologies in tens of seconds. We also show that Condor
supports the daunting task of designing multi-phase network
expansions that can be carried out on live networks.
We present our approach for overcoming the cost, operational
complexity, and limited scale endemic to datacenter
networks a decade ago. Three themes unify
the five generations of datacenter networks detailed in
this paper. First, multi-stage Clos topologies built from
commodity switch silicon can support cost-effective deployment
of building-scale networks. Second, much of
the general, but complex, decentralized network routing
and management protocols supporting arbitrary
deployment scenarios were overkill for single-operator,
pre-planned datacenter networks. We built a centralized
control mechanism based on a global configuration
pushed to all datacenter switches. Third, modular
hardware design coupled with simple, robust software
allowed our design to also support inter-cluster
and wide-area networks. Our datacenter networks run
at dozens of sites across the planet, scaling in capacity
by 100x over ten years to more than 1Pbps of bisection