Databricks SQL (and Apache Spark SQL in general) offers the to_number function for converting strings into numeric values. Here’s how it works and some key points to keep in mind:


  • String to Number Conversion: The primary function of to_number is to take a string representing a number (e.g., ‘$1,234.56’ or ‘12345.67’) and transform it into an actual numeric data type like DECIMAL. This is crucial for calculations and comparisons in your SQL queries.

  • Format Specification: to_number gives you flexibility. You can provide a format string that matches the structure of your input string. This ensures accurate conversion even if your numbers have currency symbols, commas for thousands separators, or specific decimal point conventions.


to_number(string_expr, format_string)
  • string_expr: The string you want to convert.
  • format_string: (Optional) A string describing the expected format of your number. If you omit this, Spark SQL will try to infer the format.


-- Basic conversion
SELECT to_number('12345.67');  -- Output: 12345.67 (DECIMAL)

-- Conversion with format string
SELECT to_number('$12,345.67', 'S$999,999.99');  -- Output: 12345.67 (DECIMAL)

-- Handling negative numbers and currency symbols
SELECT to_number('<12345.67>', 'MI99999.99');  -- Output: -12345.67 (DECIMAL)
SELECT to_number('€12.345,67', '€99.999,99');  -- Output: 12345.67 (DECIMAL) 

Important Considerations

  • Error Handling: If your input string doesn’t match the specified format, to_number will throw an error.
  • Format Strings: Spark SQL uses standard format string patterns. Refer to the Databricks documentation or Apache Spark documentation for details.
  • Data Type: The returned value is usually a DECIMAL. You might need to cast it to a specific numeric type (e.g., INT, DOUBLE) if needed.
  • Locale: Be aware of your locale settings when working with currency symbols and decimal points.

