Profiling a warehouse-scale computer

With the increasing prevalence of warehouse-scale (WSC)
and cloud computing, understanding the interactions of server
applications with the underlying microarchitecture becomes
ever more important in order to extract maximum performance
out of server hardware. To aid such understanding, this paper
presents a detailed microarchitectural analysis of live datacenter
jobs, measured on more than 20,000 Google machines
over a three year period, and comprising thousands of different
applications.
We first find that WSC workloads are extremely diverse,
breeding the need for architectures that can tolerate application
variability without performance loss. However, some
patterns emerge, offering opportunities for co-optimization
of hardware and software. For example, we identify common
building blocks in the lower levels of the software stack.
This “datacenter tax” can comprise nearly 30% of cycles
across jobs running in the fleet, which makes its constituents
prime candidates for hardware specialization in future server
systems-on-chips. We also uncover opportunities for classic
microarchitectural optimizations for server processors, especially
in the cache hierarchy. Typical workloads place signifi-
cant stress on instruction caches and prefer memory latency
over bandwidth. They also stall cores often, but compute heavily
in bursts. These observations motivate several interesting
directions for future warehouse-scale computers

Source: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44271.pdf