Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering

We present the design of Espresso, Google’s SDN-based Internet
peering edge routing infrastructure. This architecture grew out of a
need to exponentially scale the Internet edge cost-effectively and to
enable application-aware routing at Internet-peering scale. Espresso
utilizes commodity switches and host-based routing/packet process-
ing to implement a novel fine-grained traffic engineering capability.
Overall, Espresso provides Google a scalable peering edge that is
programmable, reliable, and integrated with global traffic systems.
Espresso also greatly accelerated deployment of new networking
features at our peering edge. Espresso has been in production for
two years and serves over 22% of Google’s total traffic to the Inter-
net.

Source: http://delivery.acm.org/10.1145/3100000/3098854/p432-Yap.pdf?ip=71.127.43.118&id=3098854&acc=OA&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E5945DC2EABF3343C&CFID=822652791&CFTOKEN=84906944&__acm__=1508966541_3307496633bb16bac9c9b6dbf3cd6d11

Advertisements

Espresso makes Google cloud faster, more available and cost effective by extending SDN to the public internet

Today, we gave a keynote presentation at the Open Networking Summit, where we shared details about Espresso, Google’s peering edge architecture—the latest offering in our Software Defined Networking (SDN) strategy. Espresso has been in production for over two years and routes 20 percent of our total traffic to the internet—and growing. It’s changing the way traffic is directed at the peering edge, delivering unprecedented scale, flexibility and efficiency.

Source: https://www.blog.google/topics/google-cloud/making-google-cloud-faster-more-available-and-cost-effective-extending-sdn-public-internet-espresso/

Cutting the Cord: a Robust Wireless Facilities Network for Data Centers

Today’s network control and management traffic are limited by
their reliance on existing data networks. Fate sharing in this context
is highly undesirable, since control traffic has very different availability
and traffic delivery requirements. In this paper, we explore
the feasibility of building a dedicated wireless facilities network for
data centers. We propose Angora, a low-latency facilities network
using low-cost, 60GHz beamforming radios that provides robust
paths decoupled from the wired network, and flexibility to adapt to
workloads and network dynamics. We describe our solutions to address
challenges in link coordination, link interference and network
failures. Our testbed measurements and simulation results show
that Angora enables large number of low-latency control paths to
run concurrently, while providing low latency end-to-end message
delivery with high tolerance for radio and rack failures.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43860.pdf

Flexible Network Bandwidth and Latency Provisioning in the Datacenter

Abstract
Predictably sharing the network is critical to achieving
high utilization in the datacenter. Past work has focussed
on providing bandwidth to endpoints, but often
we want to allocate resources among multi-node services.
In this paper, we present Parley, which provides
service-centric minimum bandwidth guarantees, which
can be composed hierarchically. Parley also supports
service-centric weighted sharing of bandwidth in excess
of these guarantees. Further, we show how to configure
these policies so services can get low latencies even at
high network load. We evaluate Parley on a multi-tiered
oversubscribed network connecting 90 machines, each
with a 10Gb/s network interface, and demonstrate that
Parley is able to meet its goals.

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43871.pdf

Gestalt: Fast, unified fault localization for networked systems

Abstract— We show that the performance of existing
fault localization algorithms differs markedly for different
networks; and no algorithm simultaneously provides
high localization accuracy and low computational overhead.
We develop a framework to explain these behaviors
by anatomizing the algorithms with respect to six
important characteristics of real networks, such as uncertain
dependencies, noise, and covering relationships. We
use this analysis to develop Gestalt, a new algorithm that
combines the best elements of existing ones and includes
a new technique to explore the space of fault hypotheses.
We run experiments on three real, diverse networks. For
each, Gestalt has either significantly higher localization
accuracy or an order of magnitude lower running time.
For example, when applied to the Lync messaging system
that is used widely within corporations, Gestalt localizes
faults with the same accuracy as Sherlock, while
reducing fault localization time from days to 23 seconds.

Source: https://www.usenix.org/system/files/conference/atc14/atc14-paper-mysore.pdf