Snowflake Performance : Search Optimization Service — Part 1

Photo by Evgeni Tcherkasski on Unsplash

In this blog we are going to discuss another interesting Enterprise edition enabled feature from Snowflake viz. Search Optimization Service. If you have worked with Index in any of on-premises database this feature is very similar to that, on top of that you get some automatic maintenance for which you pay premium(i.e. Snowflake credits).

How much relevant is this to certification exam?

This can be good candidate for SnowPro Advance Architect, Data Engineering certification. This falls under the performance bucket and may also be somewhat aligned to Advance Admin certification.

What is Search Optimization Service?

Search Optimization Service is an optimized data structure, called the search access path which is created by Snowflake as background maintenance service which scans all the table micro partition and records the metadata. This metadata helps snowflake to build the best search access path for table data. When a user fires a query, snowflake optimizer looks into the shortest and best access path for that data to give the quickest response if SOS is enabled for that table.

Pictorial Presentation of search access path

The search optimization service can significantly improve the performance of certain types of lookup and analytical queries that use an extensive set of predicates for filtering.

Search Optimization Service incur both storage and compute cost.

How does Search Optimization Service work at high level?

Search Optimization Service build the search access path which helps eliminate the micro-partition which doesn't have requested data. This helps prune micro-partition which doesn’t contain matching data which helps to scan very less amount of micro-partition.

In the below diagram you can see that the query is only looking for value(i.e. 58623502,58623505) from customer table. Since search optimization service is enabled for this table which has billions of records this make optimizer use for SOS service as best option to query the data. The pictorial image below is to make you understand that as per SOS search access path only 2 micro-partition out of many has matching data hence it will only scan 2 micro-partition.

Concept Courtesy Snowflake.

I know the thought came to your mind than what is the difference between SOS & table clustering, both are doing the same thing i.e. partition pruning. We will cover this part later in this blog.

How to identify the best candidate table for Search Optimization Service?

Below could be taken as approach for SOS candidate selection:

  1. For Clustered Table -Table is frequently queries on non-cluster key column
  2. For Non-Clustered Table -Table is frequently queries on any column.

How to identify the best candidate queries for Search Optimization Service?

Below could be taken as approach for SOS query candidate selection:

  1. A query that typically runs for a few seconds or longer.
  2. The query returns a few rows with highly selective filters.
  3. A query in which at least one of the columns accessed through the query filter operation has at least 1- 2 lakh distinct values.

— Use APPROX_COUNT_DISTINCT to get the approximate number of distinct values:

SELECT APPROX_COUNT_DISTINCT(C_CUSTOMER_SK) FROM CUSTOMER;

How can we enable Search Optimization Service for a table ?

Equality or IN Predicates(General Available):

By default, if you don't provide any column name while enabling Search Optimization Service for a table than it will enable equality SOS for all column of table which supports any lookup queries using =, IN clause.

Consider below example where I just add Search Optimization Service for Customer table.

— How to enable SOS for table?

ALTER TABLE CUSTOMER ADD SEARCH OPTIMIZATION;

— How to check the list of column for which SOS is enabled ?

DESCRIBE SEARCH OPTIMIZATION ON CUSTOMER;

Substrings and Regular Expressions(Preview feature):

If we want to improve the performance of queries with predicates that search for substrings or use regular expressions than we can enable Substring SOS for a column in a table.

The search optimization service can improve performance when searching for substrings that are 5 or more characters long. Search with lesser character than 5 will not trigger SOS even if you have SOS enable for that column.

This includes predicates that use:

Source: Snowflake

— How to enable Substring SOS for a all column in a table?

ALTER TABLE CUSTOMER_SUBSTRING ADD SEARCH OPTIMIZATION ON SUBSTRING(*) ;

DESCRIBE SEARCH OPTIMIZATION ON CUSTOMER_SUBSTRING;

— How to enable Substring SOS for a particular column?

ALTER TABLE CUSTOMER ADD SEARCH OPTIMIZATION ON SUBSTRING(C_CUSTOMER_ID) ;

Geospatial Functions(Preview feature):

If we want to improve the performance of queries with predicates that use geospatial functions with GEOGRAPHY objects than we can enable GEO SOS for a column in a table.Look to Snowflake documentation for supported predicates with geospatial functions.

— -Create Geography data type table

CREATE OR REPLACE TABLE MAP (ID NUMBER, G1 GEOGRAPHY);

— Insert some Geo data

INSERT INTO MAP VALUES
(1, ‘POINT(-122.35 37.55)’),
(2, ‘LINESTRING(-124.20 42.00, -120.01 41.99)’),
(3, ‘POLYGON((0 0, 2 0, 2 2, 0 2, 0 0))’);

— How to enable GEO SOS for a particular column?

ALTER TABLE MAP ADD SEARCH OPTIMIZATION ON GEO(G1) ;

— How to check the list of column for which SOS is enabled ?
DESCRIBE SEARCH OPTIMIZATION ON MAP;

— How to enable GEO SOS for a all column in a table?

ALTER TABLE MAP ADD SEARCH OPTIMIZATION ON GEO(*) ;

VARIANT Columns(Preview feature):

If we want to improve the performance of queries with predicates of point lookup queries on semi-structured data in Snowflake tables(data in VARIANT, OBJECT, and ARRAY columns) than we can enable SOS for a matching datatype column in a table. For current limitation look into snowflake documentation.

See below example where we had enabled the SOS on src variant data type column of CAR_SALES table.

— How to enable EQUALITY SOS for a variant, array, object data type column in a table?

ALTER TABLE CAR_SALES ADD SEARCH OPTIMIZATION ON EQUALITY(SRC);

— How to check the list of column for which SOS is enabled ?

DESCRIBE SEARCH OPTIMIZATION ON CAR_SALES;

— How to enable EQUALITY SOS for all variant, array, object data type column in a table?

ALTER TABLE CAR_SALES ADD SEARCH OPTIMIZATION ON EQUALITY(*);

How to verify if Search Optimization Service is enabled and active for a table?

You can use SHOW TABLES command and look at ”search_optimization” column. If Search Optimization Service is enabled it will have value “ON”.

After enabling SOS it takes some time to create a search access path for that table via background maintenance service. It also depends on the size of the table. This progress can be monitored via ”search_optimization_progress” column which says up to what percentage of work has been completed. Once it is 100% than your search access path is ready to use. Since this is background process, so it is transparent to user and don't interrupt any processing on table. You might not get the power of SOS or query might be a bit slow, but you will always see up-to-date data.

The search optimization service creates a search access path data structure that requires space for each table on which search optimization is enabled.Typically, the size is approximately 1/4 of the original table’s size but sometime can be the same as the table size where all columns have data types that use the search access path, and all data values in each column are unique, the required storage can be as much as the original table’s size. You can see the SOS size under ”search_optimization_bytes” column

SHOW TABLES LIKE ‘CUSTOMER’;
SELECT “name”,”rows”,”bytes”,”automatic_clustering”,”search_optimization”,”search_optimization_progress”,”search_optimization_bytes” FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) ;

What are the data type supported by Search Optimization Service?

Please refer snowflake documentation for this part as this will keep on updating in near future so no point putting anything here.

What are queries not supported by the Search Optimization Service?

How to drop the Search Optimization Service?

To drop search optimization for substrings on the column C_CUSTOMER_ID, execute the following statement:

ALTER TABLE CUSTOMER DROP SEARCH OPTIMIZATION ON SUBSTRING(C_CUSTOMER_ID) ;

To drop search optimization for all methods on the column C_CURRENT_CDEMO_SK, execute the following statement

ALTER TABLE CUSTOMER DROP SEARCH OPTIMIZATION ON C_CURRENT_CDEMO_SK;

To removing Search Optimization service from the table.

ALTER TABLE IF EXISTS CUSTOMER DROP SEARCH OPTIMIZATION;

I will cover the remaining part of SOS service in my next blog.Stay Tuned…!

Hope this blog helps you to get insight into the Search Optimization Service feature. Feel free to ask a question in the comment section if you have any doubts regarding this. Give a clap if you like the blog. Stay connected to see many more such cool stuff. Thanks for your support.

You Can Find Me:

Subscribe to my YouTube Channel: https://www.youtube.com/c/RajivGuptaEverydayLearning

Follow me on Medium: https://rajivgupta780184.medium.com/

Follow me on Twitter: https://twitter.com/RAJIVGUPTA780

Connect with me in LinkedIn: https://www.linkedin.com/in/rajiv-gupta-618b0228/

#Keep learning #Keep Sharing #RajivGuptaEverydayLearning #SnowflakeDataSuperhero #RajivGupta

References: https://docs.snowflake.com/en/

--

--