This article describes some of the ongoing research projects
related to structured data management at Google today.
The organization of Google encourages research scientists
to work closely with engineering teams. As a result, the
research projects tend to be motivated by real needs faced
by Google’s products and services, and solutions are put
into production and tested rapidly. In addition, because of
the sheer scale at which Google operates, the engineering
challenges faced by Google’s services often require research
In Google’s early days, structured data management was
mostly needed for storing and serving data related to ads.
However, as the company grows into hosted applications and
the analyses performed on its query streams and indexes get
more sophisticated, structured data management is becoming
a key infrastructure in all parts of the company.
What we describe below is a subset of ongoing projects,
not a comprehensive list. Likewise, there are others who are
involved in structured data management projects, or have
contributed to the ones described here, some of whom are
Roberto Bayardo, Omar Benjelloun, Vignesh Ganapathy,
Yossi Matias, Rob Pike and Ramakrishnan Srikant.
Sections 2 and 3 describe projects whose goal is to enable
search on collections of structured data that exist today on
the web. Section 2 describes our efforts to crawl content that
resides behind forms on the web, and Section 3 describes
our initial work on enabling search on collections of HTML
tables. Section 4 describes work on mining large collections
of data and social graphs. Sections 5 and 6 describe recent
progress on BigTable, our main infrastructure for storing