February 26th, 2025

Exciting new T-SQL features: Regex support, Fuzzy string-matching, and bigint support in DATEADD – preview

Abhiman Tiwari
Senior Product Manager

We are thrilled to announce the public preview of three powerful T-SQL features that will elevate your SQL queries: Regular Expressions (Regex) support, Fuzzy string-matching, and bigint support in DATEADD. These enhancements are now available in Azure SQL Database and SQL Database in Microsoft Fabric, offering developers more flexibility, efficiency, and precision in their data operations. They will also help simplify complex operations that previously required workarounds or external processing.

In this post, we’ll explore the capabilities of each feature, provide practical examples, and explain how they will transform the way you write T-SQL queries.

Regular Expressions (Regex) support in T-SQL

Regular expressions are a powerful tool for pattern matching and data extraction. They allow you to search, validate, and manipulate text based on specific patterns. With native Regex support in T-SQL, you can now use POSIX-compliant regular expressions to perform sophisticated string operations directly within your SQL queries.

Advantages of Regex based pattern matching

  • Flexibility: Easily create complex search patterns to match specific criteria.
  • Efficiency: Perform advanced string operations in a single query, reducing the need for multiple steps.
  • Data Validation: Ensure data meets specific formatting requirements.
  • Data Transformation: Standardize and clean your data by applying Regex patterns.

 

Here is the list of Regex functions being included in this release:

  • REGEXP_LIKE: This function returns TRUE if a string matches a regular expression pattern, or FALSE otherwise.
  • REGEXP_COUNT: This function returns the number of times a regular expression pattern matches in a string.
  • REGEXP_INSTR: This function returns the starting or ending position, based on the specified option, of the given occurrence of a regular expression pattern in a string.
  • REGEXP_REPLACE: This function returns a modified string replaced by a ‘replacement string’, where occurrence of the regular expression pattern found.
  • REGEXP_SUBSTR: This function returns a substring that matches a regular expression pattern from a string.

Note

REGEXP_LIKE is available only under compatibility level 170 and above. If your database compatibility level is lower than 170, SQL Server can’t find and run REGEXP_LIKE. Other regular expression scalar functions are available at all compatibility levels. You can check compatibility level in the sys.databases view or in database properties. You can change the compatibility level of a database with the following command:

ALTER DATABASE [DatabaseName] SET COMPATIBILITY_LEVEL = 170;

If your current database compatibility level is below 170 and you have a use-case where you want to leverage the REGEXP_LIKE function but need additional time to perform the complete analysis and testing before switching the compatibility level, you can use REGEXP_COUNT or REGEXP_INSTR functions as an interim solution.

Example –

SELECT	[Name], Email
FROM	(VALUES('John Doe', 'john@contoso.com'),
	('Alice Smith', 'alice@fabrikam.com'),
	('Bob Johnson', 'bob@fabrikam.net'),
	('Charlie Brown', 'charlie@contoso.co.in')) as e (Name, Email)
WHERE	REGEXP_COUNT(e.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$') > 0
--OR	REGEXP_INSTR(e.email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$') > 0;

To learn more about the feature, please visit –

Fuzzy String-Matching

Real-world data often contains inconsistencies, such as typos, spelling variations or minor errors. Fuzzy string-matching functions help you identify and match similar strings, making it easier to match, clean and analyze your data. These functions include EDIT_DISTANCE, EDIT_DISTANCE_SIMILARITY, JARO_WINKLER_DISTANCE, and JARO_WINKLER_SIMILARITY.

Advantages of fuzzy match

  • Handle Data Inconsistencies: Enhance string search experience by retrieving similar and approximate matches, addressing spelling variations, typos, missing letters, swapped characters etc.
  • Improved Data Quality: Enhance data deduplication and entity resolution tasks by accurately matching similar records. Reduce the need for manual data cleansing.
  • Flexible Matching: Use different algorithms to suit various use cases and data characteristics.

 

Here is the list of fuzzy match functions that are being introduced:

  • The EDIT_DISTANCE function calculates the number of characters that need to be substituted, inserted, or deleted, to transform one string into another.
  • The EDIT_DISTANCE_SIMILARITY function returns a similarity score based on the edit distance.
  • The JARO_WINKLER_DISTANCE function calculates the similarity between two strings using the Jaro-Winkler algorithm.
  • The JARO_WINKLER_SIMILARITY function returns a similarity score using the Jaro-Winkler algorithm.

Note

Currently, the functions do not adhere to the comparison semantics defined by collation settings, such as case-insensitivity and other collation-specific rules. Once support for collation rules is implemented, the functions’ output will reflect these semantics and may change accordingly.

Example –

Here is an example query with all the functions –

SELECT	t.source_string,
	t.target_string,
	EDIT_DISTANCE(t.source_string, t.target_string) as ED_Distance,
	JARO_WINKLER_DISTANCE(t.source_string, t.target_string) as JW_Distance,

	EDIT_DISTANCE_SIMILARITY(t.source_string, t.target_string) as ED_Similarity,
	CAST(JARO_WINKLER_SIMILARITY(t.source_string, t.target_string)*100 as int) as JW_Similarity
FROM (VALUES
        -- Customer Names (Typos & Variants)
        ('Johnathan Doe', 'Jonathon Doe'),
        ('Elizabeth Smith', 'Elizabth Smithe'),
        ('Christopher Nolan', 'Kristopher Nolen'),
        ('Catherine Johnson', 'Katherine Jhonson'),
        ('Michael Brown', 'Micheal Browne'),

        -- Product Names (Typos & Variants)
        ('Surface Pro', 'SurfacePro'),
        ('XBox 10', 'XBox X'),
        ('Azure SQL Database', 'Azure SQL DB'),
        ('SQL Server', 'SQLServer'),

        -- Addresses (Typos & Variants)
        ('123 Main St, New York, NY', '123 Main Street, NYC'),
        ('456 Elm Ave, Los Angeles, CA', '456 Elm Av., LA, CA'),
        ('789 Oak Dr, Houston, TX', '789 Oak Drive, Houston TX'),
        ('321 Pine Blvd, Chicago, IL', '321 Pyn Bld, Chicago IL'),
        ('654 Maple Rd, Miami, FL', '654 Mpl Rd., Miami'),

        -- Common Words (Spelling Mistakes & Variations)
        ('Accommodate', 'Acommodate'),
        ('Catalogue', 'Catalog'),
        ('Definitely', 'Definately'),
        ('February', 'Febuary'),
        ('Final', 'Finale'),
        ('Organise', 'Organize'),
        ('Programme', 'Program'),
        ('Receive', 'Recieve'),
        ('Separate', 'Seperate')       
    ) as t(source_string, target_string);

To learn more about the feature, please visit – learn page.

DATEADD (bigint support)

The DATEADD function now supports BIGINT for the number parameter, enabling precise calculations with large interval values. This update removes the previous integer-only limitation, simplifying high-precision datetime computations.

Advantages of bigint support in DATEADD

  • It will help perform accurate date and time calculations, especially with precise units like MICROSECOND or NANOSECOND.
  • The earlier limit of INT could result in an overflow if the interval value is large. Bigint ensures safe calculations for historical or long-term data.

Examples –

Here are couple of examples showcasing the bigint support in DATEADD.

DECLARE @datetime2 AS DATETIME2;
SET @datetime2 = '2024-02-26 01:01:01.1111111';

-- Add a very large number of nanoseconds to the datetime

SELECT DATEADD (NANOSECOND, 2223372036854775807, @datetime2) as NewDateTime; 

-- To calculate the difference in milliseconds from the current date until December 31, 9999 using the DATEDIFF_BIG function, and subsequently add those milliseconds to the current date and time

SELECT DATEADD (MILLISECOND, DATEDIFF_BIG(MILLISECOND,GETDATE(),'9999-12-31'), GETDATE());

To learn more, please visit – learn page.

Conclusion

These new T-SQL features—Regex support, Fuzzy string-matching, and bigint support in DATEADD—empower developers to write more efficient, flexible, and precise SQL queries. Whether you are cleaning and validating data, performing complex string manipulations, matching similar strings, or handling high-precision date calculations, these enhancements will significantly improve your data management capabilities.

We encourage you to explore these features in Azure SQL Database and SQL DB in Fabric and provide your valuable feedback. Stay tuned for more updates and enhancements in the future.

Thank you for being a part of our community and helping us make Azure SQL Database and SQL Server the best platform for your data needs!

Author

Abhiman Tiwari
Senior Product Manager

Abhiman Tiwari is a Senior Product Manager on the Azure SQL team at Microsoft, primarily focusing on developer experience with Azure SQL DB. In his previous roles, he has worked as a web developer/ tech-lead, developed several software applications/ products majorly in .NET and related technologies, and done a lot of .NET debugging, reverse engineering, and web application performance tuning.

1 comment

  • Ron Michael Zettlemoyer

    This is great! Thanks Abhiman! A question about compatibility level 170: What is it?! My Azure SQL databases let me switch to it (from 160 right now) but all the Microsoft documentation about compatibility levels only goes up to 160. I’m just curious if there are any other big changes in 170 that we should be aware of before jumping into it.