Copy Into Snowflake

Mastering the Snowflake ‘COPY INTO’ Command: Efficient Data Loading

Snowflake, the cloud-based data warehouse, provides a powerful and versatile command called ‘COPY INTO’ for streamlined data loading. Whether working with structured data in flat files or semi-structured data like JSON, the ‘COPY INTO’ command is your go-to tool for populating your Snowflake tables. Let’s explore it in detail.

Fundamentals of ‘COPY INTO’

The basic syntax of the ‘COPY INTO’ command is as follows:

SQL

COPY INTO <table_name>

FROM <location>

[FILE_FORMAT = ( <file_format_options> )]

[<copy_options>]

Use code with caution.

content_copy

Let’s break down the key components:

<table_name>: The name of the existing Snowflake table where you want to load the data.
<location>: This specifies the source location of your data files.
- Internal Stage: A Snowflake stage referencing a location within your Snowflake account (e.g., an S3 bucket).
- External Stage: A stage pointing to an external storage location (e.g., Amazon S3, Google Cloud Storage, or Microsoft Azure).
<file_format_options> (Optional): This section lets you specify the file format (CSV, JSON, Parquet, etc.), compression type, delimiter, and other format-specific details.
<copy_options> (Optional): Use this to configure error handling, data transformations, and loading optimizations.

Loading Data: A Step-by-Step Example

Suppose you have a CSV file named ‘customers.csv’ located in an Amazon S3 bucket that you’ve linked to a Snowflake stage named ‘customers_stage.’ Here’s how you load the data:

SQL

COPY INTO my_customer_table

FROM @customers_stage/customers.csv

FILE_FORMAT = (TYPE = ‘CSV’ FIELD_DELIMITER = ‘,’ SKIP_HEADER = 1);

Use code with caution.

content_copy

In this example, we specify the CSV format, using a comma as the delimiter. The file contains a header row to skip.

Best Practices and Optimizations

File Compression: Compress your data files (e.g., GZIP) to reduce storage costs and improve load times.
File Sizing: Aim for files roughly 100-250 MB in compressed form for optimal performance.
Parallel Loads: Leverage Snowflake’s multi-cluster warehouses to load large datasets concurrently.
Error Handling: To manage potential data quality issues, configure options like ON_ERROR = ‘CONTINUE’ or ON_ERROR = ‘ABORT_STATEMENT.’
Data Transformations: Apply on-the-fly transformations during loading using SQL expressions within the ‘COPY INTO’ command.

Semi-Structured Data (JSON)

To load a JSON file, you might use a query like this:

SQL

COPY INTO product_reviews

FROM @product_data_stage/reviews.json

FILE_FORMAT = (TYPE = ‘JSON’);

Use code with caution.

content_copy

Beyond the Basics

The ‘COPY INTO’ command offers a wealth of additional options. The Snowflake documentation deepens into pattern matching for filenames, advanced error handling, and numerous other features to customize your data-loading workflow.

You can find more information about Snowflake in this Snowflake

Conclusion:

Unogeeks is the No.1 IT Training Institute for SAP Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Snowflake here – Snowflake Blogs

You can check out our Best In Class Snowflake Details here – Snowflake Training

Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Instagram: https://www.instagram.com/unogeeks

Facebook: https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks

Conclusion:

Leave a Reply Cancel reply