Databricks Hierarchy
Databricks Hierarchy
Let’s break down the data object hierarchy within the Databricks Lakehouse platform.
Understanding Databricks and the Lakehouse
- Databricks is a unified data analytics platform that combines the adaptability of data lakes with the reliability and structure of data warehouses.
- Lakehouse: A data architecture paradigm that merges data lakes and data warehouses. The Databricks Lakehouse delivers structured data management atop the open storage formats data lakes utilize.
The Hierarchical Structure
Databricks data objects are organized hierarchically for efficient organization and governance. Let’s explore this hierarchy, primarily focusing on the Unity Catalog:
- Metastore (Unity Catalog): The foundation of the organization. It stores metadata about all your data objects (like location, structure, etc.). Unity Catalog provides a centralized way to manage data access and permissions.
- Catalog: A logical grouping of databases/schemas within a megastore. Think of it as a broad organizational container.
- Database (Schema): A collection of tables, views, and other data objects within a catalog. Databases serve as namespaces for group-related objects.
- Table: The core data structure holding your information. There are two primary types:
- Managed Tables: Data resides within Databricks. When you delete a managed table, the data is also removed.
- Unmanaged Tables: Data lives outside Databricks (e.g., cloud storage). Deleting an unmanaged table only removes the metadata reference.
- View: A virtual table derived from SQL queries on existing tables. Views don’t store their data but provide a different lens on existing datasets.
- Function: A block of reusable code for data transformations or custom logic.
Hierarchy and Permissions
Permissions in Unity Catalog cascade down the hierarchy:
- Permissions set at the megastore level apply to all objects within it.
- Permissions set on a catalog trickle down to its databases and their contained objects.
- Permissions on a database propagate to include tables, views, etc.
Example
- Metascore: MyCompanyData
- Catalog: SalesData
- Database: CustomerInfo
- Tables: customers (managed)
- orders (unmanaged)
- View: high_value_customers
- Function: calculate_lifetime_value
Key Points
- Security: Unity Catalog lets you manage fine-grained access controls at each hierarchy level.
- Flexibility: The Lakehouse and Unity Catalog balance structure and the adaptability needed to work with various data types.
- Legacy Systems: Databricks also supports the built-in Hive metastore for older workspaces.
Databricks Training Demo Day 1 Video:
Conclusion:
Unogeeks is the No.1 IT Training Institute for Databricks Training. Anyone Disagree? Please drop in a comment
You can check out our other latest blogs on Databricks Training here – Databricks Blogs
Please check out our Best In Class Databricks Training Details here – Databricks Training
Follow & Connect with us:
———————————-
For Training inquiries:
Call/Whatsapp: +91 73960 33555
Mail us at: info@unogeeks.com
Our Website ➜ https://unogeeks.com
Follow us:
Instagram: https://www.instagram.com/unogeeks
Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute
Twitter: https://twitter.com/unogeeks