The Data Center as a Computer

25 years is a very long time in technology, and warehouse-scale computing has gone from being a niche optimization specialized for Google workloads to being commonplace. This evolution is reflected in a vibrant collection of academic research papers.

Picking the top 25 for this chapter was challenging. To help with our selection, we focused on papers from Google. While this reflects our familiarity with work at Google (and is a nod to the outsized impact Google has had in inventing and scaling warehouse-scale computing), there is a lot of excellent work from outside Google, as demonstrated in the numerous references in this book.

We then asked prominent authors to choose their top papers individually and then combined these scores to identify the top 50 papers. Finally (with great difficulty), we narrowed the list down to the 25 you are going to read about next. Our goal was to give you, the readers, a sense of the top papers in the area, but also provide some breadth across different areas of WSCs.

In this chapter, we also experimented with AI by using Gemini to read through our top paper list and provide summaries. We combined that with our own notes and wrote the final summaries you see below. Each paper includes a short paragraph on what the paper is about and then a short (subjective) comment on why we picked it for this top-25 list. We hope you enjoy reading these excellent papers and have as much fun as we had in compiling this list.

Complete Chapter

A Selection of 25 Top Papers for 25 Years of WSC

Original PDF

Selected Papers

The Anatomy of Search: Google's Early Architecture

The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin and Lawrence Page, Computer Networks, 30 (1998), pp. 107-117

This paper marked the beginning of modern web search, demonstrating that the web's hyperlink graph measures page authority (captured by the PageRank algorithm) and greatly improves ranking. The Google prototype also revealed the significant descriptive power of anchor text, successfully utilizing these external labels to improve accuracy. We all take ranking for granted, but this paper showed how to get superior search relevance over content-only methods. Last but not least, their prototype demonstrated one of the key WSC approaches by sharding their web index across inexpensive commodity hardware.

Why we picked this paper:

"It's the classic search engine paper. And it got rejected by the major conference in the field."

25 Top Papers

Complete Chapter

A Selection of 25 Top Papers for 25 Years of WSC

Selected Papers

The Anatomy of Search: Google's Early Architecture

Web Search for a Planet: The Google Cluster Architecture

The Google File System

MapReduce: Simplified Data Processing on Large Clusters

Bigtable: A Distributed Storage System for Structured Data

The Chubby Lock Service for Loosely-coupled Distributed Systems

The Case for Energy-Proportional Computing

Dremel: Interactive Analysis of Web-Scale Datasets

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers

Spanner: Google's Globally-Distributed Database

B4: Experience with a Globally Deployed Software Defined WAN

The Tail at Scale

Profiling a Warehouse-scale Computer

Large-scale Cluster Management at Google with Borg

Site Reliability Engineering: How Google Runs Production Systems

In-Data center Performance Analysis of a Tensor Processing Unit

Attack of the Killer Microseconds

Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization

Snap: a Microkernel Approach to Host Networking

Thunderbolt: Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale

Swift: Delay is Simple and Effective for Congestion Control in the Data center

Warehouse-Scale Video Acceleration: Co-design and Deployment in the Wild

Jupiter Evolving: Transforming Google's Data center Network via Optical Circuit Switches and Software-Defined Networking

Pathways: Asynchronous Distributed Dataflow for ML