Databricks Hierarchy

Share

              Databricks Hierarchy

Let’s break down the data object hierarchy within the Databricks Lakehouse platform.

Understanding Databricks and the Lakehouse

  • Databricks is a unified data analytics platform that combines the adaptability of data lakes with the reliability and structure of data warehouses.
  • Lakehouse: A data architecture paradigm that merges data lakes and data warehouses. The Databricks Lakehouse delivers structured data management atop the open storage formats data lakes utilize.

The Hierarchical Structure

Databricks data objects are organized hierarchically for efficient organization and governance. Let’s explore this hierarchy, primarily focusing on the Unity Catalog:

  1. Metastore (Unity Catalog): The foundation of the organization. It stores metadata about all your data objects (like location, structure, etc.). Unity Catalog provides a centralized way to manage data access and permissions.
  2. Catalog: A logical grouping of databases/schemas within a megastore. Think of it as a broad organizational container.
  3. Database (Schema):  A collection of tables, views, and other data objects within a catalog. Databases serve as namespaces for group-related objects.
  4. Table:  The core data structure holding your information. There are two primary types:
    • Managed Tables: Data resides within Databricks. When you delete a managed table, the data is also removed.
    • Unmanaged Tables:  Data lives outside Databricks (e.g., cloud storage). Deleting an unmanaged table only removes the metadata reference.
  5. View: A virtual table derived from SQL queries on existing tables. Views don’t store their data but provide a different lens on existing datasets.
  6. Function: A block of reusable code for data transformations or custom logic.

Hierarchy and Permissions

Permissions in Unity Catalog cascade down the hierarchy:

  • Permissions set at the megastore level apply to all objects within it.
  • Permissions set on a catalog trickle down to its databases and their contained objects.
  • Permissions on a database propagate to include tables, views, etc.

Example

  • Metascore:  MyCompanyData
  • Catalog: SalesData
  • Database: CustomerInfo
    • Tables: customers (managed)
    • orders (unmanaged)
  • View:  high_value_customers
  • Function: calculate_lifetime_value

Key Points

  • Security:  Unity Catalog lets you manage fine-grained access controls at each hierarchy level.
  • Flexibility:  The Lakehouse and Unity Catalog balance structure and the adaptability needed to work with various data types.
  • Legacy Systems:  Databricks also supports the built-in Hive metastore for older workspaces.

Databricks Training Demo Day 1 Video:

 
You can find more information about Databricks Training in this Dtabricks Docs Link

 

Conclusion:

Unogeeks is the No.1 IT Training Institute for Databricks Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Databricks Training here – Databricks Blogs

Please check out our Best In Class Databricks Training Details here – Databricks Training

 Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Follow us:

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks


Share

Leave a Reply

Your email address will not be published. Required fields are marked *