Replace in Databricks

Share

             Replace in Databricks

In Databricks, you can replace values in strings and DataFrames using a couple of methods:

1. SQL Functions:

  • replace(str, search, replace): This function replaces all occurrences of a specific substring (search) within a string (str) with another substring (replace).
  • SQL
  • SELECT replace(‘Hello world’, ‘world’, ‘Databricks’); — Output: ‘Hello Databricks’
  • regexp_replace(str, regexp, rep): This function replaces parts of a string (str) that match a regular expression (regexp) with another string (rep).
  • SQL
  • SELECT regexp_replace(‘Hello 123 world’, ‘[0-9]+’, ‘456’); — Output: ‘Hello 456 world’

2. DataFrame API:

  • withColumn() and replace(): You can use the DataFrame API to create or replace columns. In combination with the replace function, you can also replace values within a specific column.
  • Python
  • from pyspark.sql.functions import col, replace
  •  
  • df = df.withColumn(“new_column”, replace(col(“old_column”), “old_value”, “new_value”))

Example: Replace Values in a DataFrame Column

Python

from pyspark.sql.functions import col, regexp_replace

 

df = spark.createDataFrame([(“This is a test string.”,), (“Another string 123.”,)], [“text”])

 

# Replace specific substrings

df = df.withColumn(“replaced_text”, replace(col(“text”), “string”, “replaced_string”))

 

# Replace values using regular expressions

df = df.withColumn(“replaced_text_regex”, regexp_replace(col(“text”), “[0-9]+”, “456”))

 

df.show(truncate=False)

 

Output:

+———————–+——————————+—————————–+

|text |replaced_text |replaced_text_regex |

+———————–+——————————+—————————–+

|This is a test string. |This is a test replaced_string.|This is a test string. |

|Another string 123. |Another replaced_string 123. |Another replaced_string 456. |

+———————–+——————————+—————————–+

Replacing Data in Delta Tables

For a full replacement of data in a Delta table that might be used in concurrent operations, use the following pattern:

SQL

CREATE OR REPLACE TABLE table_name AS SELECT * FROM parquet. `/path/to/files`;

Databricks Training Demo Day 1 Video:

 
You can find more information about Databricks Training in this Dtabricks Docs Link

 

Conclusion:

Unogeeks is the No.1 IT Training Institute for Databricks Training. Anyone Disagree? Please drop in a comment

You can check out our other latest blogs on Databricks Training here – Databricks Blogs

Please check out our Best In Class Databricks Training Details here – Databricks Training

 Follow & Connect with us:

———————————-

For Training inquiries:

Call/Whatsapp: +91 73960 33555

Mail us at: info@unogeeks.com

Our Website ➜ https://unogeeks.com

Follow us:

Instagram: https://www.instagram.com/unogeeks

Facebook:https://www.facebook.com/UnogeeksSoftwareTrainingInstitute

Twitter: https://twitter.com/unogeeks


Share

Leave a Reply

Your email address will not be published. Required fields are marked *