Replace in Databricks
Replace in Databricks
In Databricks, you can replace values in strings and DataFrames using a couple of methods:
1. SQL Functions:
- replace(str, search, replace): This function replaces all occurrences of a specific substring (search) within a string (str) with another substring (replace).
- SQL
- SELECT replace(‘Hello world’, ‘world’, ‘Databricks’); — Output: ‘Hello Databricks’
- regexp_replace(str, regexp, rep): This function replaces parts of a string (str) that match a regular expression (regexp) with another string (rep).
- SQL
- SELECT regexp_replace(‘Hello 123 world’, ‘[0-9]+’, ‘456’); — Output: ‘Hello 456 world’
2. DataFrame API:
- withColumn() and replace(): You can use the DataFrame API to create or replace columns. In combination with the replace function, you can also replace values within a specific column.
- Python
- from pyspark.sql.functions import col, replace
- df = df.withColumn(“new_column”, replace(col(“old_column”), “old_value”, “new_value”))
Example: Replace Values in a DataFrame Column
Python
from pyspark.sql.functions import col, regexp_replace
df = spark.createDataFrame([(“This is a test string.”,), (“Another string 123.”,)], [“text”])
# Replace specific substrings
df = df.withColumn(“replaced_text”, replace(col(“text”), “string”, “replaced_string”))
# Replace values using regular expressions
df = df.withColumn(“replaced_text_regex”, regexp_replace(col(“text”), “[0-9]+”, “456”))
df.show(truncate=False)
Output:
+———————–+——————————+—————————–+
|text |replaced_text |replaced_text_regex |
+———————–+——————————+—————————–+
|This is a test string. |This is a test replaced_string.|This is a test string. |
|Another string 123. |Another replaced_string 123. |Another replaced_string 456. |
+———————–+——————————+—————————–+
Replacing Data in Delta Tables
For a full replacement of data in a Delta table that might be used in concurrent operations, use the following pattern:
SQL
CREATE OR REPLACE TABLE table_name AS SELECT * FROM parquet. `/path/to/files`;
Databricks Training Demo Day 1 Video:
Conclusion:
Unogeeks is the No.1 IT Training Institute for Databricks Training. Anyone Disagree? Please drop in a comment
You can check out our other latest blogs on Databricks Training here – Databricks Blogs
Please check out our Best In Class Databricks Training Details here – Databricks Training
Follow & Connect with us:
———————————-
For Training inquiries:
Call/Whatsapp: +91 73960 33555
Mail us at: info@unogeeks.com
Our Website ➜ https://unogeeks.com
Follow us:
Instagram: https://www.instagram.com/unogeeks
Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute
Twitter: https://twitter.com/unogeeks