Snowflake External Table Vs Query on Stage…Pros & Cons

Photo by Gabriella Clare Marino on Unsplash

This blog is going to be a very informative blog. One of my friends asked me what would be my thought on using Snowflake External table VS directly querying on External stage ? At the end both are doing the same thing query external file and givings us data than what's the difference and what are the pros & cons for using one on top of others. Before we start with actual discussion let's get some warm up.

What is external table?

External table is not a new concept, and we have seen or using this from on-premises RDBMS days. Only difference is earlier data was sitting on same server in form of files now data/files are sitting on cloud storage, and we are reading from there. Concept remain same only way of implementation and technologies has now changed. By name, you can guess that data is external to Snowflake which point you in the direction that data resides in either of the clouds provided storage may it be AWS or AZURE or GCP, it works same for snowflake without any noticeable difference to Snowflake customer. External stage can only be created using external stage. It does not support internal stage.

External tables store file-level metadata about the data files, such as the filename, a version identifier and related properties. This enables querying data stored in files in an external stage as if it were inside a database. External table support all file format which COPY INTO statement supports.

By default, the external table always stores all data into a single “value” column whose data type is variant regardless of what is file format. If you do Select * from <External_Table> then you will get a value variant column as a result unless you have created some explicit casting on top of the value column to make some sense out of data.

External table has “Schema on Read”

What is querying from stage in Snowflake?

Snowflake has a unique SQL support to query from internal or external named stage(Internal storage or external storage) just like any other SQL. This can be useful for inspecting/viewing the contents of the staged files, particularly before loading or after unloading data. You can get some info by explicitly calling metadata(METADATA$FILENAME & METADATA$FILE_ROW_NUMBER) columns of staged files, but it mostly always full file scan on every execution.

This functionality is primarily for performing simple queries only, particularly when loading and/or transforming data, and is not intended to replace loading data into tables and performing queries on the tables.

Use -Cases ? When to use what?

External Table:

1> Optimum use-case for external table are when you don't want to move the data into Snowflake due to some regulation, or you don't want to bring all the data because you don't need all of them but may be part of them.

2> When your data is static and doesn't change much like static reference data, mapping data etc.

3> When your requirement is to only read the data and not to update the data. External table are read-only table.

4> Dataset is small.

5> Performance is not the primary goal. Not meant to support micro-second read performance SLA.

Querying Data in Staged Files:

1> This can be useful for inspecting/viewing the contents of the staged files, particularly before loading or after unloading data.

2> This functionality is primarily for performing simple queries only, particularly when loading and/or transforming data, and is not intended to replace loading data into tables and performing queries on the tables.

Pros & Cons :

1> When you create External Table you do create some sort of metadata in Snowflake service layer like filename, file path, file size, no of rows etc. which help some sort of optimization as compared to direct querying from stage where Snowflake totally unaware of your data loose any opportunity of optimization and end up using more compute with full file scan.

2> Both external table & query from stage is read-only operation, but external table is more optimized in terms of automated background refresh to keep updated the metadata if and when any new file added, deleted from external storage. No such automation of stages in snowflake.

3> External stage only supports external storage from AWS or AZURE or GCP whereas you can query directly from stages regardless of whether it's external or internal.

4> Query from stage functionality is primarily for performing simple queries only, particularly when loading and/or transforming data, and is not intended to replace loading data into tables and performing queries on the tables whereas external table is meant for static data reading functionality.

5> You can partition external table, but you can't partition stage.

6>You can create materialized view on top of external table to further enhance the performance to improve the performance, but you can't create materialized view on stage. You will get below error

Materialized view not supported over a stage.

7> External table is Snowflake database object whereas direct query from stage is just a functionality to on stage.

8> You can actually name the column to identify the data and store that info in Snowflake. Generally all columns are queries in stage as $1,$2 and so on. You can't name them, but you can use alias on query.

9> External table has automated maintenance cost involved. Query from stage is generally an on-demand compute process.

10> Snowflake supports integrating Apache Hive metastores with Snowflake using external tables. Stage doesn’t support any such feature.

11> There is full set of standard DDL syntax support for external table whereas stage query is just a Select operation.

12> There is full set of metadata table in Snowflake to track external table history, refresh history, file registration history. None of them are required in case of stage query.

Hope this blog helps you to get insight into the Snowflake external table and Stage query functionality. Feel free to ask a question in the comment section if you have any doubts regarding this. Give a clap if you like the blog. Stay connected to see many more such cool stuff. Thanks for your support.

You Can Find Me:

Subscribe to my YouTube Channel: https://www.youtube.com/c/RajivGuptaEverydayLearning

Follow me on Medium: https://rajivgupta780184.medium.com/

Follow me on Twitter: https://twitter.com/RAJIVGUPTA780

Connect with me in LinkedIn: https://www.linkedin.com/in/rajiv-gupta-618b0228/

#Keep learning #Keep Sharing #RajivGuptaEverydayLearning #SnowflakeDataSuperhero #RajivGupta

--

--