When To Use Which Table In Snowflake?

Rajiv Gupta
10 min readDec 26, 2022

--

Photo by Kenny Eliason on Unsplash

In this blog we are going to discuss on various table type available in Snowflake and their specific use-case. I will try to make the use case as much as clear and hope after reading this blog you can easily make your decision to pick the right table type for your requirement. I am also going to include a brief of all table type, so you are aware of what we are talking about.

I would like to give credit to Patrick Cuba for some nice icon you see below. I used them to make my blog more attractive. Thanks.

Temporary Table —

  1. Temporary table are session specific and also known as session table. This kind of table are also known as #temp table in older RDBMS like SQL server, Sybase etc.
  2. These tables are not visible outside the session and not visible to other user except who created it.
  3. Temporary table don't enforce name uniqueness like other table, and you can create a temporary table with the same name as any other existing table in the same schema, but the temporary table will take precedence over other table in the same schema with the same name in a session.
  4. Temporary table contribute to storage charges for the duration it was holding data in session.
  5. You can create a temporary table that has the same name as an existing table in the same schema, effectively hiding the existing table.
  6. You can create a table that has the same name as an existing temporary table in the same schema; however, the newly-created table is hidden by the temporary table.
  7. Time travel for temporary table is 0–1 days or the remainder of the session, whichever is shorter.
  8. No fail-safe for temporary table
  9. Temporary table does not require the CREATE TABLE privilege on the schema in which the object is created.
  10. To create temporary table we have to use below syntax sample.
CREATE TEMPORARY TABLE RAJIVTABLE (ID NUMBER, CREATION_DATE DATE);

Use-Case:

  1. For any temporary data requirement.
  2. For ETL process where you want to store some intermediary session data which you don't need outside the current session.
  3. No data protection required beyond session.

Transient Tables —

  1. Transient table are just like permanent table with the exception that they don't have fail-safe period.
  2. Transient table holds data across session and with other users.
  3. This table needs to be explicitly dropped like the permanent table.
  4. Transient table contribute to storage charges for the duration it was holding data just like permanent table. Fail-safe period charges are not applicable for this table.
  5. Transient table cannot be converted to any other table type.
  6. Time travel for transient table is 0–1 days.
  7. No fail-safe for transient table
  8. To create transient table we have to use below syntax sample.
CREATE TRANSIENT TABLE RAJIVTABLE (ID NUMBER, CREATION_DATE DATE);    

Use-Case:

  1. Need feature like permanent table but want cost-effective solution.
  2. Don't want to protect data for fail-safe period.
  3. Want an option to recover deleted history data for max 1 day is required.
  4. Specifically designed for transitory data that needs to be maintained beyond each session.

Permanent Tables —

  1. These are the default table type in Snowflake.
  2. These table holds data until you explicitly dropped them. These are the most preferred table type for application which want to store data for longer period.
  3. Permanent table can be guarded by time travel as per edition of choice. Standard edition(0 or 1) & Enterprise edition or higher(1 to 90).
  4. 7 days fail-safe for permanent table.
  5. Query cache is supported for 24 hours.
  6. To create permanent table we have to use below syntax sample.
CREATE TABLE RAJIVTABLE (ID NUMBER, CREATION_DATE DATE);

Use-Case:

  1. For all persistence data storage business case.
  2. Want an option to recover deleted history data between 1–90 days.
  3. Want data protection in fail-safe period.

External Tables —

  1. External table as name suggest, data resides outside Snowflake database but in one of the supported cloud provider storage account, only store file-level metadata about the data files, such as the filename, a version identifier and related properties.
  2. External tables can access data stored in any format supported by COPY INTO <table> statements.
  3. External table are read-only table.
  4. You can partition your external table for better performance.
  5. Performance of the external table can be improved by creating a materialized view on top of the external table.
  6. All table operation(query, joins etc.) except DML operation is possible on external table.
  7. External stage is a prerequisite for creating external table.
  8. External table can be refreshed automatically via cloud event notification and charged under snowpipe charges. On the other hand it can be refreshed manually and charged under standard cloud services billing model.
  9. Although data resides outside snowflake, so it didn’t incur storage cost in Snowflake, but it does incur external table metadata refresh cost which is required to keep file metadata up to date.
  10. All data store in single variable column called as VALUE.
  11. To create external tables, you are only required to have some knowledge of the file format and record format of the source data files. Knowing the schema of the data files is not required.
  12. Use manual refresh external table option when you’re choosing to synchronize external tables with other metastores (e.g. AWS Glue or Apache Hive). use parameter PARTITION_TYPE = USER_SPECIFIED in table definition for manual refresh.
  13. External table can support Delta Lake format by including parameter TABLE_FORMAT = DELTA. Currently, in preview.
  14. Query cache is supported for 24 hours like permanent table.
  15. To create external table we have to use below syntax sample.
CREATE EXTERNAL TABLE RAJIV_EXT_FEED(
DATE_PART DATE AS TO_DATE(SPLIT_PART(METADATA$FILENAME, '/', 3)
|| '/' || SPLIT_PART(METADATA$FILENAME, '/', 4)
|| '/' || SPLIT_PART(METADATA$FILENAME, '/', 5), 'YYYY/MM/DD'),
TIMESTAMP BIGINT AS (VALUE:TIMESTAMP::BIGINT),
COL2 VARCHAR AS (VALUE:COL2::VARCHAR))
PARTITION BY (DATE_PART)
LOCATION=@EXTSTAGE/DAILY/
REFRESH_ON_CREATE = FALSE
AUTO_REFRESH = FALSE
FILE_FORMAT = (TYPE = PARQUET)
TABLE_FORMAT = DELTA;

Use-Case:

  1. When you have some regulatory restriction to take data out of that particular region and your snowflake account is in different region.
  2. When you have static reference data and there is very minor change in data that too occasionally, and you want to use that data as reference data.
  3. You don't want to perform any DML operation on the table.
  4. You have very huge volume of data file, but you only need small portion of data for your usage.
  5. Performance is not the primary goal.
  6. Schema-on-read is valid for your business requirement.
  7. If you want to integrate Apache Hive metastores with Snowflake using external tables.

Directory Tables —

  1. Directory table is a building block for supporting unstructured data into snowflake.
  2. As the name suggest, directory table is not actually a table to hold data rather it is used to retrieve the Snowflake-hosted file URL to each file in the stage.
  3. Directory tables are created on top of internal or external stage and being supported by all 3 cloud provider in case of external stage.
  4. Directory table are very similar to external table, but it is different in terms of usage. External table is used to read data external to Snowflake whereas Directory tables store a catalog of staged files in cloud storage.
  5. To query named stage using directory table we have to use the below syntax sample.
SELECT * FROM DIRECTORY( @<stage_name> );

6. The following types of URLs are available to access files in cloud storage:

Scoped URL — Encoded URL that permits temporary access to a staged file without granting privileges to the stage.

File URL — URL that identifies the database, schema, stage, and file path to a set of files

Pre-signed URL — Simple HTTPS URL used to access a file via a web browser.

Use-Case:

  1. Support for unstructured data use-case.
  2. Want to share unstructured data securely using Snowflake capabilities.
  3. Want to use unstructured data for AI/ML in Snowflake.

Hybrid Tables —

  1. This is currently in private preview and few things might change in due course.
  2. Unistore is a hybrid workload in Snowflake that brings the OLTP workload into reality.
  3. HYBRID table is an implementation of Unistore architecture which allow row based storage in Snowflake.
  4. You can join both hybrid table and normal table, and it just works fine all together without any change in your query structure. All the complexity is transparent to the consumer.
  5. It just works like any other table and support all the snowflake capability like time travel, clone etc.
  6. Support Primary & Foreign key in Hybrid table.
  7. Support for primary key index on primary key column.
  8. You will pay double storage charges for all hybrid table in snowflake as the same data is kept in both row storage and columnar storage behind the scene to cater the same performance while both kind of table are used together.
  9. All data replication behind the scene for hybrid table between row & columnar storage etc. is transparent to end consumer but compute are charged as standard cloud services.
  10. To create hybrid table we have to use below syntax sample.
CREATE HYBRID TABLE RAJIVTABLE (ID NUMBER, CREATION_DATE DATE);

Use-Case:

  1. When you want to migrate small to medium data volume transactional application to Snowflake.
  2. Good for small to medium scale web application.
  3. Not suitable for OLTP workload which need to honor sub sec response on huge volume of data.
  4. Reduce the time to market by bringing best of both world together and reduce ETL or other processing time.
  5. Simplify architectures and standardize security and governance controls on a single platform, while eliminating the need to move or copy data.

Dynamic Tables —

  1. Dynamic table was first introduced during submit 2022 as materialized table, later it was renamed to dynamic table.
  2. This is currently in private preview and few things might change in due course.
  3. Dynamic Tables are a new table type in Snowflake that lets teams use simple SQL statements to declaratively define the result of your data pipelines.
  4. It has a combined feature of Stream & Task which look of incremental delta change only at certain lag(duration).
  5. It doesnt support UDFs/UDTFs, Stored Procedures, External Functions, and Snowpark transformations written in Python, Java, or Scala at the moment.
  6. You can control Dynamic table refresh as per your business requirement unlike Materialized view where refresh was based on Snowflake internal algorithm. You have full control on freshness of your data.
  7. Virtual warehouse is used for all compute requirement in case of Dynamic table. You can size your warehouse as per your workload.
  8. Joins on multiple table are supported and worked seamlessly with your complex business logic.
  9. Supports both normal as well as Dynamic table to form a DAG.
  10. Refresh only what’s changed, even for complex queries, automatically.Dynamic Tables will also intelligently skip any refreshes in cases where there is no new data to process or if dependent tables are still refreshing without any user intervention.
  11. Data is not always fresh as compared to materialized view, but you will see the up-to-date data as per last data refreshed as per your defined lag .
  12. No more overhead to define multiple stream and task if your building SQL-based transformation pipelines.
  13. Not for real or near real time visualization dashboard built on any BI tool l as data is not always latest. It depends on your refresh defination.
  14. To create Dynamic table we have to use below syntax sample.
CREATE OR REPLACE DYNAMIC TABLE RAJIVTABLE
LAG = ' 1 minute'
WAREHOUSE = COMPUTE_WH
AS SELECT * FROM RAJIVTAB ;

Use-Case:

  1. You are looking to build a SQL-based transformation pipelines which require complex SQL, including Joins, Aggregates, Window Functions along with control on refresh frequency and compute sizing.
  2. You are not looking for “always upto date data” like feature what your are getting in materialized view. Else you have to use Materialized view.
  3. If you are building pipeline which doesn’t need to incorporate UDFs/UDTFs, Stored Procedures, External Functions, and Snowpark transformations written in Python, Java, or Scala. Esle you have to use traditional approach of using Stream and Task.

Hope this blog helps you to get insight into the diffrent table type in snowflake and their use-cases. Feel free to ask a question in the comment section if you have any doubts regarding this. Give a clap if you like the blog. Stay connected to see many more such cool stuff. Thanks for your support.

You Can Find Me:

Subscribe to my YouTube Channel: https://www.youtube.com/c/RajivGuptaEverydayLearning

Follow me on Medium: https://rajivgupta780184.medium.com/

Follow me on Twitter: https://twitter.com/RAJIVGUPTA780

Connect with me on LinkedIn: https://www.linkedin.com/in/rajiv-gupta-618b0228/

#Keep learning #Keep Sharing #RajivGuptaEverydayLearning #SnowflakeDataSuperhero #RajivGupta

--

--

Rajiv Gupta
Rajiv Gupta

Written by Rajiv Gupta

Sr. Architect at Cognizant, Snowflake Data Super Hero, Snowflake SME

Responses (6)