March 22nd, 2024

Introducing Regular Expression (Regex) Support in Azure SQL DB

Abhiman Tiwari
Senior Product Manager

We are pleased to announce the private preview of regular expressions (regex) support in Azure SQL Database. Regex is a powerful tool that allows you to search, manipulate, and validate text data in flexible ways. With regex support, you can enhance your SQL queries with pattern matching, extraction, replacement, and more. You can also combine them with other SQL functions and operators to create complex expressions and logic.

This feature can help you to:

  • Enhance your data quality and accuracy by validating and correcting data formats, such as phone numbers, email addresses, zip codes, etc.
  • Extract valuable insights and information from the data by extracting and grouping specific text patterns, such as keywords, hashtags, mentions, etc.
  • Transform and standardize your data by replacing, splitting, or joining text patterns, such as abbreviations, acronyms, synonyms, etc.
  • Clean and optimize your data by removing or filtering out unwanted text patterns, such as whitespace, punctuation, duplicates, etc.

The Regex feature in Azure SQL DB follows the POSIX standard and is compatible with the standard regex syntax and supports a variety of regex functions, such as REGEXP_LIKE, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, and REGEXP_SUBSTR. The feature also supports case sensitivity, character classes, quantifiers, anchors, and capturing groups. The feature is available for all Azure SQL DB service tiers and offerings.

Feature Exploration

Here is the list of Regex functions that are being introduced:

  • REGEXP_LIKE: This function returns TRUE if a string matches a regular expression pattern, or FALSE otherwise.
  • REGEXP_COUNT: This function returns the number of times a regular expression pattern matches in a string.
  • REGEXP_INSTR: This function returns the starting or ending position, based on the specified option, of the given occurrence of a regular expression pattern in a string.
  • REGEXP_REPLACE: This function returns a modified string replaced by a ‘replacement string’, where occurrence of the regular expression pattern found.
  • REGEXP_SUBSTR: This function returns a substring that matches a regular expression pattern from a string.

Examples

We’ll use the following table as an example to demonstrate the usage of these functions. The table contains some information about employees, such as their names, email addresses, and phone numbers.

ID  Name  Email  Phone_Number 
101 John Doe john@contoso.com 123-456-7890
102 Alice Smith alice@fabrikam.com 234-567-8901
103 Bob Johnson bob@fabrikam.net 345-678-9012
104 Eve Jones eve@contoso.com 456-789-0123
105 Charlie Brown charlie@contoso.co.in 567-890-1234

 

The following are some examples of using Regex functions in SQL queries.

REGEXP_LIKE

This function returns True if the input string matches the regex pattern, and False otherwise. You can use it to filter rows based on a regex condition and to apply check constraints to ensure that the data fulfils the specified criteria.

Let’s create Employee table with some records and Check constraints for Email and Phone_Number columns:

-- Create Employees table with some records and check constraints for Email and Phone_Number columns
DROP TABLE IF EXISTS Employees
CREATE TABLE Employees (
    ID INT IDENTITY(101,1),
    [Name] VARCHAR(150),
    Email VARCHAR(320)
    CHECK (REGEXP_LIKE(Email, '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')),
    Phone_Number VARCHAR(20)
    CHECK (REGEXP_LIKE (Phone_Number, '^(\d{3})-(\d{3})-(\d{4})$'))
);
-- Insert some sample data
INSERT INTO Employees ([Name], Email, Phone_Number) VALUES
    ('John Doe', 'john@contoso.com', '123-456-7890'),
    ('Alice Smith', 'alice@fabrikam.com', '234-567-8901'),
    ('Bob Johnson', 'bob@fabrikam.net','345-678-9012'),
    ('Eve Jones', 'eve@contoso.com', '456-789-0123'),
    ('Charlie Brown', 'charlie@contoso.co.in', '567-890-1234');

Insert a new row into the Employees table with the following values to validate the Check constraints:

--Failure:  
INSERT INTO Employees ([Name], Email, Phone_Number) VALUES 
    ('Demo Data', 'demo@contoso@com', '123-456_7890') 

--Success: 
INSERT INTO Employees ([Name], Email, Phone_Number) VALUES 
    ('Demo Data', 'demo@contoso.com', '123-456-7890') 

Filter rows based on regex condition:

For example, the following query returns all the rows from the Employees table where the email address domain ends with ‘.com’.

-- find all the employees whose email addresses end with .com
SELECT [Name], Email
FROM Employees
WHERE REGEXP_LIKE(Email, '\.com$');

Results:

Name Email
John Doe john@contoso.com
Alice Smith alice@fabrikam.com
Eve Jones eve@contoso.com

 

REGEXP_COUNT

This function returns the number of times a regex pattern matches in the input string. You can use it to count the occurrences of a pattern within a string. For example, the following query returns the number of vowels in each Employee’s name.

-- for each employee, count the number of vowels in their name
SELECT [Name], REGEXP_COUNT([Name], '[AEIOU]',1,'i') AS Vowel_Count
FROM Employees;

Results:

Name  Vowel_Count 
John Doe 3
Alice Smith 4
Bob Johnson 3
Eve Jones 4
Charlie Brown 4

 

REGEXP_INSTR

This function returns the starting or ending position, based on the specified option, of the given occurrence of a regular expression pattern in a string. You can use it to locate the index of a pattern within a string. For example, the following query returns the position of ‘@’ in each Employee’s email address.

-- for each employee, show their name, email, and the position of the @ sign in their email
SELECT [Name], Email, REGEXP_INSTR(email, '@') AS Position_of_@
FROM Employees;

Results:

Name  Email  Position_of_@ 
John Doe john@contoso.com 5
Alice Smith alice@fabrikam.com 6
Bob Johnson bob@fabrikam.net 4
Eve Jones eve@contoso.com 4
Charlie Brown charlie@contoso.co.in 8

 

REGEXP_REPLACE

This function returns a modified string replaced by a ‘replacement string’, where occurrence of the regular expression pattern found. You can use it to modify or transform text data based on a regex pattern. For example, the following query returns the phone number of each Employee in a standardized format.

-- format the phone numbers in the Employees table to the format (XXX) XXX-XXXX.
SELECT Phone_Number, REGEXP_REPLACE(Phone_Number, '(\d{3})-(\d{3})-(\d{4})', '(\1) \2-\3',1) AS Phone_Format
FROM Employees;

Results:

EMP_PHONE  PHONE_FORMAT 
123-456-7890 (123) 456-7890
234-567-8901 (234) 567-8901
345-678-9012 (345) 678-9012
456-789-0123 (456) 789-0123
567-890-1234 (567) 890-1234

 

REGEXP_SUBSTR

This function returns the substring that matches the regular expression pattern from a string. You can use it to extract parts of a string based on a regex pattern. For example, the following query returns the domain name of each employee’s email address.

-- for each employee, show the domain of their email address
SELECT [Name], Email, REGEXP_SUBSTR(email, '@(.+)$', 1, 1,'c',1) AS Domain
FROM Employees;

Results:

Name  Email  Domain 
John Doe john@contoso.com contoso.com
Alice Smith alice@fabrikam.com fabrikam.com
Bob Johnson bob@fabrikam.net fabrikam.net
Eve Jones eve@contoso.com contoso.com
Charlie Brown charlie@contoso.co.in contoso.co.in

 

Private Preview Sign-up

We are currently accepting requests from customers who would like to participate in the private preview and try out the regex feature. If you are interested, please fill out this form: https://aka.ms/regex-preview-signup

Image NewQRCode for Regex in Azure SQL Database Private Preview Sign up

Wrapping up

We value your feedback and suggestions as we continue to improve and enhance SQL DB. Please let us know what you think of the regex feature and how it helps you with your data analysis and manipulation. You can submit your feedback using this Form or leave a comment on this blog.

Thank you for choosing Azure SQL Database as your data platform. We hope you enjoy the regex feature and find it beneficial for your needs.

Author

Abhiman Tiwari
Senior Product Manager

Abhiman Tiwari is a Senior Product Manager on the Azure SQL team at Microsoft, primarily focusing on developer experience with Azure SQL DB. In his previous roles, he has worked as a web developer/ tech-lead, developed several software applications/ products majorly in .NET and related technologies, and done a lot of .NET debugging, reverse engineering, and web application performance tuning.

8 comments

Discussion is closed. Login to edit/delete existing comments.

  • MD Bo

    Congrats! That’s awesome, and I’ll be eagerly awaiting it once it goes GA! (my employer prohibits preview features)

  • ronen ariely · Edited

    📌Wow!!!🤩 𝑨𝒛𝒖𝒓𝒆 𝑺𝑸𝑳 𝑹𝒆𝒈𝒖𝒍𝒂𝒓 𝒆𝒙𝒑𝒓𝒆𝒔𝒔𝒊𝒐𝒏𝒔!💥This is awesome!

    SQL Server 2005 introduced the integration of the Common Language Runtime (CLR) and we started implementing regular expressions, while repeatedly asking Microsoft to add regular expression support as a built-in feature in the server. About twenty years later and it's coming! 𝙏𝙝𝙞𝙨 𝙞𝙨 𝙘𝙖𝙪𝙨𝙚 𝙛𝙤𝙧 𝙘𝙚𝙡𝙚𝙗𝙧𝙖𝙩𝙞𝙤𝙣!🎊🎉

    The function I most like from the list mentioned is REGEXP_REPLACE. I am using SQLCLR for such solution today.

    I noticeed only a few regex...

    Read more
    • Abhiman TiwariMicrosoft employee Author

      Thanks Ronen! Looking forward to have you on board to experience firsthand and provide your valuable feedback.

  • Daniel Smith · Edited

    This is fantastic, and will simplify string manipulation immensely. What’s the plans for full SQL Server? Will this be added to 2022 in a future cumulative update, or would it have to wait for the next release? Please try to avoid feature divergence wherever possible otherwise it creates confusion and hinders migrations.

    • Abhiman TiwariMicrosoft employee Author

      Thanks @Daniel Smith! Backporting any new feature poses considerable challenges and presents numerous limitations and potential issues, makes it impractical to undertake. However, this will be included in the Microsoft SQL Server vNext release.

  • MgSam

    Congrats!! Long overdue for Regex support in SQL! I’m looking forward to this making its way into the next SQL Server version.

    I’d have preferred the .NET Regex variant over POSIX, but beggars can’t be choosers!

    • Abhiman TiwariMicrosoft employee Author

      Thanks @MgSam. This will be part of Microsoft SQL Server vNext release.
      AFAIK, the .NET Regex is compatible with Perl, which supports POSIX. Therefore, I believe your use-cases should be addressed. I encourage you to give it a try and share your feedback if there are any particular use-cases that are not supported.