Inside Capacitor, BigQuery’s next-generation columnar storage format | Google Cloud Big Data and Machine Learning Blog  |  Google Cloud Platform

The “BigQuery under the hood” blog post gave an overview of different BigQuery subsystems. Today, we’ll take a deeper dive into one of them — the storage system. We’ll follow data as it’s loaded into a BigQuery table.BigQuery supports several input formats — CSV, JSON, Datastore backups, AVRO — and when they’re imported, data is converted into our internal representation. BigQuery uses a columnar storage that supports semistructured data — nested and repeated fields. When Google published the Dremel paper in 2010, it explained how this structure is preserved within column store. Every column, in addition to its value, also stores two numbers — definition and repetition levels. This encoding ensures that the full or partial structure of the record can be reconstructed by reading only requested columns, and never requires reading parent columns (which is the case with alternative encodings).



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s