How you can Access the Data from ADLS to Databricks

Here’s a comprehensive guide on how to access data from Azure Data Lake Storage (ADLS) in Azure Databricks, combining the best practices and addressing potential issues:

Methods

There are four primary methods to integrate ADLS into your Databricks workflow:

Mounting ADLS to DBFS using Service Principal and OAuth 2.0:
- The most secure and recommended approach
- Create a service principal in Azure Active Directory (AAD) and grant it the necessary permissions on your ADLS storage account.
- Use this service principal’s credentials and OAuth 2.0 to mount the ADLS container as a directory within the Databricks File System (DBFS).
Using a Service Principal Directly:
- This approach is similar to the mounting approach, but you provide the service principal’s client ID and secret directly in your code.
- Consider storing credentials securely using Databricks Secrets for better security.
Using the ADLS Storage Account Access Key:
- Retrieve the storage account access key from the Azure portal.
- Embed this key in your code to access data.
- Less secure: Exercise caution if using this method in production environments
Credential Passthrough:
- Leverages your Azure Active Directory credentials to access ADLS.
- It requires additional configuration and might be suitable for specific scenarios.

Steps for Mounting ADLS (Recommended)

- Create a Service Principal in AAD: Follow Microsoft’s documentation on creating service principals: [invalid URL removed]
- Assign Permissions to the Service Principal: Grant the service principal at least “Storage Blob Data Reader” permissions on your ADLS storage account. Consider “Storage Blob Data Contributor” or similar roles for more granular control.

Mount ADLS to DBFS: Python
configs = {
“fs.azure.account.auth.type”: “OAuth”,
“fs.azure.account.oauth.provider.type”: “org. apache. Hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”,
“fs.azure.account.oauth2.client.id”: “<service-principal-client-id>”,
“fs.azure.account.oauth2.client.secret”: dbutils.secrets.get(scope = “<secret-scope>”, key = “<secret-key>”),
“fs.azure.account.oauth2.client.endpoint”: “https://login.microsoftonline.com/<tenant-id>/oauth2/token”
}
duties.fs.mount(
source = “abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/”,
mount_point = “/mnt/<your-mount-point>”,
extra_configs = configs
)
- Replace placeholders with your service principal details, ADLS container, storage account, Databricks secret scope, and secret key where you’ve stored the service principal’s secret.

Accessing Data After Mounting:

Python

# Read a CSV file from the mounted ADLS location

df = spark.read.csv(“/mnt/<your-mount-point>/data.csv”, header=True, inferSchema=True)

Important Considerations:

Secret Management: Store sensitive credentials using Databricks Secrets or Azure Key Vault.
Best Practices: The mounting approach with service principals and OAuth 2.0 offers the best security.
Data Formats: Databricks supports various file formats (CSV, Parquet, JSON, etc.)

Databricks Training Demo Day 1 Video:

You can find more information about Databricks Training in this Dtabricks Docs Link

Conclusion:

Unogeeks is the No.1 IT Training Institute for Databricks Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Databricks Training here – Databricks Blogs

Please check out our Best In Class Databricks Training Details here – Databricks Training

Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks