Iceberg Tables on Snowflake: Design considerations and Life of an INSERT query.

A must read on evolution of Hive & Iceberg table formats, and the design changes done in Snowflake’s architecture to support Iceberg tables.

Vino Duraisamy
7 min readAug 16, 2023
Reference: https://www.snowflake.com/blog/iceberg-tables-powering-open-standards-with-snowflake-innovations/

This article covers a brief overview of traditional data lakes, Hive table format, and their limitations.

Second part of the blog highlights the capabilities of Iceberg table format, and the design considerations and architectural changes made in Snowflake’s architecture to support Iceberg.

Brief history of traditional data lakes and Hive tables:

Traditionally data lakes stored data in the form of files (csv, json, parquet, avro, ORC). Tables are created on top of these files. Table metadata is also stored in the file system.

Data for each table is stored in a directory structure. Nested directories are created to achieve table partitioning. These directory names are based on conventions well understood by the query engines. For example, a hive table partition might look like this: user/hive/warehouse/logs/dt=2023–08–07/country=GB

Relationship between data files and the SQL table is implicitly defined by the directory structure. That is, if you add new files to a specific directory, those files would automatically become part of the corresponding Hive table.

Hive table structure

Challenges with using directories as database tables:

  • It is not easy to provide ACID guarantees. Every time there is an INSERT/UPDATE/DELETE operation, in addition to adding/deleting the files in the directory, you need to update the metastore as well. It resulted in basic transactionality problems. Multi-partition updates are not atomic.
  • Additionally, there are query engines that do not always access the data through the metastore. This means, they don’t always get a consistent view of the data.
  • Partition evolution needed complete data rewrite. Because of the nature of the partitions as directories and sub directories, deleting an old partition and creating a new partition means moving data from one directory to another. i.e., complete rewrite of data.
  • To add to that, a developer writing the query must know the physical layout of the table well in advance to filter by partitions.
  • Storing data in a directory tree means you need to list all directories and subdirectories for filter statements in your query. For larger tables, these directory listings can be expensive and slow down your query significantly.
  • Overall, it is difficult to manage and maintain such a table.

Iceberg table format to the rescue:

Iceberg table format solves some of the problems mentioned above.

  • Iceberg spec specifies how to perform updates to a table so that all readers have a consistent view of data.
  • It also describes how to achieve snapshot isolation.
  • It explicitly defines file membership to a snapshot.
  • It allows for easier schema evolution and partitioning evolution by using Iceberg metadata.
  • There is an explicit metadata layer that you can update, modify or version time travel across your data and metadata.

Iceberg format overview:

Reference: https://iceberg.apache.org/spec/#overview

As shown in the above picture, Iceberg has a tiered system — data layer, metadata layer and a catalog.

  1. The data layer is similar to the traditional data lakes and it stores data files (json, csv, parquet, orc, avro).
  2. The metadata layer consists of manifest file, manifest list and metadata file. Together these files maintain a snapshot of the table. As DML operations are run on the table, newer snapshots of the table appear. So you get snapshot isolation when you are reading different versions of the iceberg tables through the catalog.
  3. A catalog is simply a key-value store that keeps track of the root metadata file for each table. The root metadata file helps find all the artifacts associated with the table.

Iceberg Tables on Snowflake:

Have you ever wanted a Snowflake table, with its features and performance, but also wanted the table to use open formats or your own cloud storage? Iceberg Tables, currently in private preview, are designed to meet your use case.

Let’s dive into understand the Snowflake 💙 Iceberg story better.

Snowflake architecture and design:

Snowflake is a multi-tenant, fully transactional, secure and highly elastic system and the architecture has three main layers, as published in SIGMOD 2016 paper.

As highlighted in the architecture reference below, design changes are needed in the storage layer, and the metadata storage in the cloud services layer to support the Iceberg table format.

Reference: https://dl.acm.org/doi/pdf/10.1145/2882903.2903741
  • In a snowflake table, data is stored in snowflake’s proprietary FDN file format, which is a highly compressed columnar format (something similar to Parquet). The metadata storage for Snowflake is stored in EP files and FDB. EP is an Expression property file that has file statistics such as min, max, etc. FDB is a key value store that stores metadata.
  • Iceberg has a different way of storing metadata. All the metadata is stored in the file system only, there is no database needed for that. So how do you map the Iceberg metadata to Snowflake metadata?
  • File storage is straight forward. Instead of writing files in snowflake’s proprietary FDN format, it is written in parquet format.

Snowflake External Volume:

When using Iceberg Tables with Snowflake, you need to write the Iceberg metadata and data into the customer’s storage bucket. For that purpose, Snowflake has created the concept of a logical container called external volume.

  • Creating external volume is supported on all cloud providers (AWS, Azure, GCP).
  • Customers can describe their storage location as a volume, and specify the access credentials to the storage bucket while creating external volumes.
  • Volumes may be associated with an account, database or a schema.
  • Data files (Parquet) and metadata files (Iceberg manifest files) are written to the external volume.
  • When you create an Iceberg table, the data files are populated in the storage volume, and are associated with the table.

Customers now have complete control over the data files and the metadata file. You can use different query engines for processing this data, and do whatever you want with it.

Life of an INSERT query:

What happens when you run an INSERT operation on an Iceberg table in Snowflake?

INSERT query on a standard table:

  1. During an INSERT query run, Snowflake writes data files in its proprietary FDN format in the storage layer.
  2. Once the data files are written, it captures statistics of each file in an EP file, which is a Snowflake metadata artifact.
  3. These EP files are then referenced by the FDB store. Once all EP files are written in FDB, it creates a new version of the table.
  4. This table version is committed.
  5. Now the upcoming queries can see the updated version of the table.

INSERT query on an Iceberg table:

Reference: https://www.snowflake.com/blog/iceberg-tables-powering-open-standards-with-snowflake-innovations/
  1. For an Iceberg table, data files are written in Parquet format (instead of FDN format) in the storage layer.
  2. Snowflake writes the metadata in EP files, and Iceberg manifest files.
  3. It then writes manifest files to a manifest list and creates a new snapshot. On the snowflake side, it creates a new table version.
  4. The new table version (FDB) and the snapshot file are committed.
  5. All the external tools that query the data can now see the newer version of the table. There is no lag, and everything happens in the same transaction. That is, the external tools will see the same updates that Snowflake sees.

Design challenges and considerations:

Suppose if a query engine like Spark wants to query the Iceberg table using Iceberg SDK, here is how the execution flow looks like.

  • For every transaction in the table, we need to generate an Iceberg metadata file, and manifest list, which will introduce significant latency for large tables.
  • There are two options to generate Iceberg metadata:
    1. Generate inline in the query, alongside Snowflake metadata.
    2. Generate in the background.
  • We decided to go with a combination of on-demand and background metadata generation.
  • Mapping data types between Snowflake and Iceberg tables (For example, Variant, GEO types) is a design challenge as well.

Wanna learn more?

Read this article on how Catalog support is added to Iceberg tables.

Watch this demo to learn more about the latest updates to Iceberg Tables.

It’s exciting to note that the performance of Iceberg table operations are getting closer to Snowflake tables, and it’s only going to get better.

Thanks for Reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium.
  2. For data engineering best practices, and Python tips for beginners, follow me on LinkedIn.
  3. Feel free to give claps so I know how helpful this post was for you.

--

--

Vino Duraisamy

Developer Advocate @Snowflake❄️. Previously Data & Applied Machine Learning Engineer @Apple, Nike, NetApp | Spark, Snowflake, Hive, Python, SQL, AWS, Airflow