The “BigQuery under the hood” blog post gave an overview of different BigQuery subsystems. Today, we’ll take a deeper dive into one of them — the storage system. We’ll follow data as it’s loaded into a BigQuery table.BigQuery supports several input formats — CSV, JSON, Datastore backups, AVRO — and when they’re imported, data is converted into our internal representation. BigQuery uses a columnar storage that supports semistructured data — nested and repeated fields. When Google published the Dremel paper in 2010, it explained how this structure is preserved within column store. Every column, in addition to its value, also stores two numbers — definition and repetition levels. This encoding ensures that the full or partial structure of the record can be reconstructed by reading only requested columns, and never requires reading parent columns (which is the case with alternative encodings).