The CASE WHEN statement is a powerful tool in Databricks for conditionally assigning values to a new column based on different conditions. It’s essentially a way to create custom logic within your SQL queries.

Here’s the basic syntax:

  CASE WHEN condition1 THEN result1
       WHEN condition2 THEN result2
       ELSE default_result
  END AS new_column_name
FROM your_table;


  • conditionN: This is a boolean expression that evaluates to true or false.
  • resultN: The value assigned to the new column if the corresponding conditionN is true.
  • default_result (optional): The value assigned if none of the conditions are true.
  • new_column_name: The name you give to the new column containing the results of the logic.


Let’s say you have a table with a column “age” and want to create a new column “age_group” that categorizes people based on age. You can use CASE WHEN like this:

SELECT name, age,
  CASE WHEN age < 18 THEN 'Under 18'
       WHEN age >= 18 AND age < 65 THEN 'Adult'
       ELSE 'Senior'
  END AS age_group
FROM people_data;

This will create a new column “age_group” that assigns “Under 18” to people younger than 18, “Adult” to people between 18 and 64 (inclusive), and “Senior” to everyone else.

Benefits of using CASE WHEN:

  • Improves code readability by replacing complex conditional logic with clear statements.
  • Creates new columns with categorized or transformed data.
  • Makes your queries more concise and easier to maintain.

Further Learning:

For more details and advanced usage of CASE WHEN in Databricks, you can refer to the official documentation:

Databricks Training Demo Day 1 Video:

You can find more information about Databricks Training in this Dtabricks Docs Link



