Azure Databricks Z-order


Z-Ordering in Azure Databricks is a technique used to optimize Delta Lake tables by physically co-locating related data within the same set of files. This co-locality improves query performance by reducing the amount of data that needs to be read, particularly when using data skipping.

How Z-Ordering Works

Z-Ordering interleaves the values of multiple columns, similar to a multi-dimensional index. When a query filters on Z-Ordered columns, Databricks can quickly identify and read only the relevant files, skipping over large portions of data that don’t match the filter criteria.

Benefits of Z-Ordering

  • Improved query performance: Z-Ordering can significantly accelerate queries, especially those with filters on high-cardinality columns (columns with many distinct values).
  • Reduced data reads: By skipping irrelevant data, Z-Ordering minimizes the amount of data read from storage, which can lower costs and improve resource utilization.
  • Enhanced data skipping: Z-Ordering works seamlessly with Delta Lake’s data skipping capabilities, further optimizing query performance.

When to Use Z-Ordering

Z-Ordering is most effective for columns that are frequently used in query predicates (filters) and have high cardinality. It’s generally less beneficial for low-cardinality columns or columns not commonly used in filters.

How to Apply Z-Ordering

You can apply Z-Ordering to a Delta Lake table in Databricks using the OPTIMIZE command with the ZORDER BY clause:

OPTIMIZE table_name
ZORDER BY column1, column2, ...

Important Considerations

  • Z-Ordering is not a one-time operation. As new data is added to the table, the Z-Ordering may become less effective over time. You might need to re-optimize the table periodically to maintain optimal performance.
  • While Z-Ordering can significantly improve query performance, it also consumes additional resources during the optimization process. It’s essential to weigh the benefits against the costs based on your specific use case.

