Azure Databricks Incremental Load


    Azure Databricks Incremental Load

Incremental data loading in Azure Databricks is a process of updating your target dataset with only the new or modified records, rather than processing the entire dataset every time. This can be achieved using various methods:

1. Auto Loader:

  • Databricks recommends Auto Loader for incremental data ingestion from cloud object storage.
  • It automatically processes new data files as they arrive in cloud storage without additional setup.
  • It can be used with Delta Live Tables (DLT) for a more streamlined approach.

2. Timestamp/Watermark Column:

  • This involves using a timestamp column in your source data to identify records that have been modified or added since the last update.
  • A control table can be used to store metadata about the last successful runtime.

3. Change Data Capture (CDC):

  • This involves capturing changes (inserts, updates, deletes) from the source database and applying them to the target dataset.
  • Azure Databricks supports CDC from various sources like Azure SQL Database, Azure Cosmos DB, etc.

Example using timestamp column (PySpark):

# Read new data from the source
new_data ="...").load("...")

# Get the last update timestamp from the control table
last_update_timestamp = spark.sql("SELECT MAX(timestamp) FROM control_table").collect()[0][0]

# Filter new data based on the timestamp
incremental_data = new_data.filter(new_data["timestamp"] > last_update_timestamp)

# Write the incremental data to the target table

# Update the control table with the latest timestamp
spark.sql(f"INSERT INTO control_table VALUES ({'timestamp')).collect()[0][0]})")

Databricks Training Demo Day 1 Video:

You can find more information about Databricks Training in this Dtabricks Docs Link



Unogeeks is the No.1 IT Training Institute for Databricks Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Databricks Training here – Databricks Blogs

Please check out our Best In Class Databricks Training Details here – Databricks Training

 Follow & Connect with us:


For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at:

Our Website ➜

Follow us:





Leave a Reply

Your email address will not be published. Required fields are marked *