Oracle Regular Expressions
Version 12.2.0.1

General Information
Library Note Morgan's Library Page Header
Coming to OpenWorld 2018? Be sure to visit the TidalScale booth in Moscone South and learn how to solve performance problems and lower costs with Software Defined Servers. Before you visit the booth, or if you can't make it this year, check out TidalScale at www.tidalscale.com. Be sure to click on the Solutions link and look through the Oracle resources.
Purpose A Regular Expressions is a text string that defines a search patterns. One might think of one as an ideal merger between the INSTRing function and WildCard characters but one that allows for matching very complex conditions.
 
Posix Syntax Elements
Anchoring Characters
Character Class Description
^ Anchor the expression to the start of a line
$ Anchor the expression to the end of a line
Equivalence Classes
Character Class Description
= = Oracle supports the equivalence classes through the POSIX '[==]' syntax. A base letter and all of its accented versions constitute an equivalence class. For example, the equivalence class '[=a=]' matches ä and â. The equivalence classes are valid only inside the bracketed expression
Match Options
Character Class Description
c Case sensitive matching
i Case insensitive matching
m Treat source string as multi-line activating Anchor chars
n Allow the period (.) to match any newline character
POSIX Characters
Character Class Description
[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Blank Space Characters
[:cntrl:] Control characters (nonprinting)
[:digit:] Numeric digits
[:graph:] Any [:punct:], [:upper:], [:lower:], and [:digit:] chars
[:lower:] Lowercase alphabetic characters
[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Space characters (nonprinting), such as carriage return, newline, vertical tab, and form feed
[:upper:] Uppercase alphabetic characters
[:xdigit:] Hexidecimal characters
Quantifier Characters
Character Class Description
* Match 0 or more times
? Match 0 or 1 time
+ Match 1 or more times
{m} Match exactly m times
{m,} Match at least m times
{m, n} Match at least m times but no more than n times
\n Cause the previous expression to be repeated n times
Alternative Matching And Grouping Characters
Character Class Description
| Separates alternates, often used with grouping operator ()
( ) Groups subexpression into a unit for alternations, for quantifiers, or for backreferencing (see "Backreferences" section)
[char] Indicates a character list; most metacharacters inside a character list are understood as literals, with the exception of character classes, and the ^ and - metacharacters
 
Demonstration Preparation
Demonstration Table and Data CREATE TABLE test(
testcol VARCHAR2(50));

INSERT INTO test VALUES ('abcde');
INSERT INTO test VALUES ('12345');
INSERT INTO test VALUES ('1a4A5');
INSERT INTO test VALUES ('12a45');
INSERT INTO test VALUES ('12aBC');
INSERT INTO test VALUES ('12abc');
INSERT INTO test VALUES ('12ab5');
INSERT INTO test VALUES ('12aa5');
INSERT INTO test VALUES ('12AB5');
INSERT INTO test VALUES ('ABCDE');
INSERT INTO test VALUES ('123-5');
INSERT INTO test VALUES ('12.45');
INSERT INTO test VALUES ('1a4b5');
INSERT INTO test VALUES ('1 3 5');
INSERT INTO test VALUES ('1  45');
INSERT INTO test VALUES ('1   5');
INSERT INTO test VALUES ('a  b  c  d');
INSERT INTO test VALUES ('a b  c   d    e');
INSERT INTO test VALUES ('a              e');
INSERT INTO test VALUES ('Steven');
INSERT INTO test VALUES ('Stephen');
INSERT INTO test VALUES ('111.222.3333');
INSERT INTO test VALUES ('222.333.4444');
INSERT INTO test VALUES ('333.444.5555');
INSERT INTO test VALUES ('abcdefabcdefabcxyz');
COMMIT;
 
REGEXP_COUNT
Count's occurrences based on a regular expression

Overload 1
REGEXP_COUNT(
srcstr   IN VARCHAR2 CHARACTER SET ANY_CS,
pattern  IN VARCHAR2 CHARACTER SET srcstr%CHARSET, position PLS_INTEGER := 1,
modifier IN VARCHAR2 DEFAULT NULL)
RETURN PLS_INTEGER;

-- match parameter:
'c' = case sensitive
'i' = case insensitive search
'm' = treats the source string as multiple lines
'n' = allows the period (.) wild character to match newline
'x' = ignore whitespace characters
SELECT REGEXP_COUNT(testcol, '2a', 1, 'i') RESULT
FROM test;

SELECT REGEXP_COUNT(testcol, 'e', 1, 'i') RESULT
FROM test;
Overload 2 REGEXP_COUNT(
srcstr   IN CLOB CHARACTER SET ANY_CS,
pattern  IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
position IN INTEGER := 1,
modifier IN VARCHAR2 DEFAULT NULL)
RETURN INTEGER;
TBD
 
REGEXP_INSTR
Syntax REGEXP_INSTR(<source_string>, <pattern>[[, <start_position>][, <occurrence>][, <return_option>][, <match_parameter>][, <sub_expression>]])

REGEXP_INSTR(
srcstr        IN VARCHAR2 CHARACTER SET ANY_CS,
pattern       IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
position      IN PLS_INTEGER := 1,
occurrence    IN PLS_INTEGER := 1,
returnparam   IN PLS_INTEGER := 0,
modifier      IN VARCHAR2 DEFAULT NULL,
subexpression IN PLS_INTEGER := 0)
RETURN PLS_INTEGER;

REGEXP_INSTR(
srcstr        INCLOB CHARACTER SET ANY_CS,
pattern       IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
position      IN INTEGER := 1,
occurrence    IN INTEGER := 1,
returnparam   IN PLS_INTEGER := 0,
modifier      IN VARCHAR2 DEFAULT NULL,
subexpression IN PLS_INTEGER := 0)
RETURN INTEGER;
Find words beginning with 's' or 'r' or 'p' followed by any 4 alphabetic characters: case insensitive SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 0, 'i') RESULT
FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 1, 1, 'i') RESULT
FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 0, 'i') RESULT
FROM dual;

SELECT REGEXP_INSTR('500 Oracle Pkwy, Redwood Shores, CA', '[o][[:alpha:]]{3}', 1, 2, 1, 'i') RESULT
FROM dual;
Find the position of try, trying, tried or tries SELECT REGEXP_INSTR('We are trying to make the subject easier.', 'tr(y(ing)?|(ied)|(ies))') RESULTNUM
FROM dual;
Using Sub-Expression option SELECT testcol, REGEXP_INSTR(testcol, 'ab', 1, 1, 0, 'i', 0)
FROM test;

SELECT testcol, REGEXP_INSTR(testcol, 'ab', 1, 1, 0, 'i', 1)
FROM test;

SELECT testcol, REGEXP_INSTR(testcol, 'a(b)', 1, 1, 0, 'i', 1)
FROM test;
 
REGEXP_LIKE
Syntax REGEXP_LIKE(
srcstr   IN VARCHAR2 CHARACTER SET ANY_CS,
pattern  IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
modifier IN VARCHAR2 DEFAULT NULL)
RETURN BOOLEAN;

REGEXP_LIKE (
srcstr   IN CLOB CHARACTER SET ANY_CS,
pattern  IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
modifier IN VARCHAR2 DEFAULT NULL)
RETURN BOOLEAN;
AlphaNumeric Characters SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alnum:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alnum:]]{5}');
Alphabetic Characters SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alpha:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:alpha:]]{5}');
Control Characters INSERT INTO test VALUES ('zyx' || CHR(13) || 'wvu');
COMMIT;

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:cntrl:]]{1}');
Digits SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:digit:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:digit:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:digit:]]{5}');
Lower Case SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]{2}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:lower:]]{5}');
Printable Characters SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:print:]]{5}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:print:]]{6}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:print:]]{7}');
Punctuation TRUNCATE TABLE test;

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:punct:]]');
Spaces SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]{2}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]{3}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:space:]]{5}');
Upper Case SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:upper:]]');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:upper:]]{2}');

SELECT *
FROM test
WHERE REGEXP_LIKE(testcol, '[[:upper:]]{3}');
Values Starting with 'a%b' SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^ab*');
'a' is the third value SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^..a.');
Contains two consecutive occurances of the letter 'a' or 'z' SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '([az])\1', 'i');
Begins with 'Ste' ends with 'en' and contains either 'v' or 'ph' in the center SELECT testcol
FROM test
WHERE REGEXP_LIKE(testcol, '^Ste(v|ph)en$');
Use a regular expression in a check constraint CREATE TABLE mytest (c1 VARCHAR2(20),
CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$')));
Identify SSN

Thanks: Byron Bush HIOUG
CREATE TABLE ssn_test (
ssn_col  VARCHAR2(20));

INSERT INTO ssn_test VALUES ('111-22-3333');
INSERT INTO ssn_test VALUES ('111=22-3333');
INSERT INTO ssn_test VALUES ('111-A2-3333');
INSERT INTO ssn_test VALUES ('111-22-33339');
INSERT INTO ssn_test VALUES ('111-2-23333');
INSERT INTO ssn_test VALUES ('987-65-4321');
COMMIT;

SELECT ssn_col
FROM ssn_test
WHERE REGEXP_LIKE(ssn_col,'^[0-9]{3}-[0-9]{2}-[0-9]{4}$');
 
REGEXP_REPLACE
Syntax
  REGEXP_REPLACE(<source_string>, <pattern>,
<replace_string>, <position>, <occurrence>, <match_parameter>)

REGEXP_REPLACE(
srcstr     IN VARCHAR2 CHARACTER SET ANY_CS,
pattern    IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
replacestr IN VARCHAR2 CHARACTER SET srcstr%CHARSET DEFAULT NULL,
position   IN PLS_INTEGER := 1,
occurrence IN PLS_INTEGER := 0,
modifier   IN VARCHAR2 DEFAULT NULL)
RETURN VARCHAR2 CHARACTER SET srcstr%CHARSET;

REGEXP_REPLACE(
srcstr     IN CLOB CHARACTER SET ANY_CS,
pattern    IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
replacestr IN CLOB CHARACTER SET srcstr%CHARSET DEFAULT NULL,
position   IN INTEGER := 1,
occurrence IN INTEGER := 0,
modifier   IN VARCHAR2 DEFAULT NULL)
RETURN CLOB CHARACTER SET srcstr%CHARSET;

REGEXP_REPLACE(
srcstr     IN CLOB CHARACTER SET ANY_CS,
pattern    IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
replacestr IN VARCHAR2 CHARACTER SET srcstr%CHARSET DEFAULT NULL,
position   IN INTEGER := 1,
occurrence IN INTEGER := 0,
modifier   IN VARCHAR2 DEFAULT NULL)
RETURN CLOB CHARACTER SET srcstr%CHARSET;
Looks for the pattern xxx.xxx.xxxx and reformats  pattern to (xxx) xxx-xxxx col testcol format a15
col result format a15

SELECT testcol, REGEXP_REPLACE(testcol, '([[:digit:]]{3})\.([[:digit:]]{3})\.([[:digit:]]{4})', '(\1) \2-\3') RESULT
FROM test
WHERE LENGTH(testcol) = 12;
Obfuscate the initial digits of a credit card number SELECT REGEXP_REPLACE('4444-1111-2222-3333', '.*([[:digit:]]{4})', '************\1')
FROM dual;

SELECT REGEXP_REPLACE('3737-373737-37373', '.*([[:digit:]]{4})', '************\1')
FROM dual;
Put a space after every character SELECT testcol, REGEXP_REPLACE(testcol, '(.)', '\1 ') RESULT
FROM test
WHERE testcol like 'S%';
Replace multiple spaces with a single space SELECT REGEXP_REPLACE('500 Oracle Parkway, Redwood Shores, CA', '( ){2,}', ' ') RESULT
FROM dual;
Insert a space between any lower case character followed by any upper case character SELECT REGEXP_REPLACE('George McGovern', '([[:lower:]])([[:upper:]])', '\1 \2') CITY
FROM dual;
Replace the period with a string (note use of forward slash: '\') SELECT REGEXP_REPLACE('We are trying to make the subject easier.','\.',' for you.') REGEXT_SAMPLE
FROM dual;
Demo CREATE TABLE t(
testcol VARCHAR2(10));

INSERT INTO t VALUES ('1');
INSERT INTO t VALUES ('2    ');
INSERT INTO t VALUES ('3 new  ');

col newval format a10

SELECT LENGTH(testcol) len, testcol origval,
REGEXP_REPLACE(testcol, '\W+$', ' ') newval,
LENGTH(REGEXP_REPLACE(testcol, '\W+$', ' ')) newlen
FROM t;
Produce comma delimited list from a string WITH q AS (
  SELECT 'person_id NUMBER(4), last_name VARCHAR2(20), salary FLOAT, int_pct NUMBER(10)' AS qval
  FROM dual)
SELECT REGEXP_REPLACE(qval, ' ([A-Za-z0-9\(\)]*,)| [A-Za-z0-9\(\)]*$',',') retval
FROM q;
Getting rid of HTML tags -- to retain the spaces so the string retains its original length
SELECT REGEXP_REPLACE('<b><u>This is a test of extracting a date, 26-AUG-2010 21:05 GMT,</u></b><br/> for example<br/><br/><b><u>from some HTML</u></b><br/> tags','<[^>]*>',' ')
FROM dual;

-- to get rid of the extra spaces
SELECT REGEXP_REPLACE('<b><u>This is a test of extracting a date, 26-AUG-2010 21:05 GMT,</u></b><br/> for  example<br/><br/><b><u>from some HTML</u></b><br/> tags', '<[^>]*>','')
FROM dual;

SELECT REGEXP_REPLACE('<table><tr><td>THIS IS A SAMPLE HTML TEXT<td><tr></table>', '<[^>]+>', '') WITHOUT_HTML_TAG
FROM dual;
 
REGEXP_SUBSTR
Syntax REGEXP_SUBSTR(source_string, pattern
[, position [, occurrence
[, match_parameter]]])

REGEXP_SUBSTR(
srcstr        IN VARCHAR2 CHARACTER SET ANY_CS,
pattern       IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
position      IN PLS_INTEGER := 1,
occurrence    IN PLS_INTEGER := 1,
modifier      IN VARCHAR2 DEFAULT NULL,
subexpression IN PLS_INTEGER := 0)
RETURN VARCHAR2 CHARACTER SET srcstr%CHARSET;

REGEXP_SUBSTR(
srcstr        IN CLOB CHARACTER SET ANY_CS,
pattern       IN VARCHAR2 CHARACTER SET srcstr%CHARSET,
position      IN INTEGER := 1,
occurrence    IN INTEGER := 1,
modifier      IN VARCHAR2 DEFAULT NULL,
subexpression IN PLS_INTEGER := 0)
RETURN CLOB CHARACTER SET srcstr%CHARSET;
Searches for a comma followed by one or more occurrences of non-comma characters followed by a comma SELECT REGEXP_SUBSTR('500 Oracle Parkway, Redwood Shores, CA',',[^,]+,') RESULT
FROM dual;
Look for http:// followed by a substring of one or more alphanumeric characters and optionally, a period (.) col result format a50

SELECT REGEXP_SUBSTR('Go to http://www.oracle.com/products and click on database',
'http://([[:alnum:]]+\.?){3,4}/?') RESULT
FROM dual;
Extracts try, trying, tried or tries SELECT REGEXP_SUBSTR('We are trying to make the subject easier.','tr(y(ing)?|(ied)|(ies))')
FROM dual;
Extract the 3rd field treating ':' as a delimiter SELECT REGEXP_SUBSTR('system/pwd@orabase:1521:sidval', '[^:]+', 1, 3) RESULT
FROM dual;
Extract from string with vertical bar delimiter CREATE TABLE regexp (
testcol VARCHAR2(50));

INSERT INTO regexp
(testcol)
VALUES
('One|Two|Three|Four|Five');

SELECT * FROM regexp;

SELECT REGEXP_SUBSTR(testcol,'[^|]+', 1, 3)
FROM regexp;
Equivalence classes SELECT REGEXP_SUBSTR('iSelfSchooling NOT ISelfSchooling', '[[=i=]]SelfSchooling') RESULT
FROM dual;
Parsing Demo set serveroutput on

DECLARE
 x VARCHAR2(2);
 y VARCHAR2(2);
 c VARCHAR2(40) := '1:3,4:6,8:10,3:4,7:6,11:12';
BEGIN
  x := REGEXP_SUBSTR(c,'[^:]+', 1, 1);
  y := REGEXP_SUBSTR(c,'[^,]+', 3, 1);

  dbms_output.put_line(x ||' '|| y);
END;
/
Analytic Demo: Returns the maximum value for each group CREATE TABLE test (
name   VARCHAR2(10),
values VARCHAR2(15));

INSERT INTO test values ('A', '1');
INSERT INTO test values ('B', '12.1.0.2');
INSERT INTO test values ('B', '8.2.1.2');
INSERT INTO test values ('B', '12.0.0');
INSERT INTO test values ('C', '11.1.2');
INSERT INTO test values ('C', '11.01.05');
COMMIT;

SELECT name, version
FROM (
  SELECT name, version,
    ROW_NUMBER() OVER (PARTITION BY name ORDER BY TO_NUMBER(
    NVL(REGEXP_SUBSTR(version, '\d+', 1, 1)
    TO_NUMBER(NVL(REGEXP_SUBSTR(version, '\d+', 1, 2),0)) DESC,
    TO_NUMBER(NVL(REGEXP_SUBSTR(version, '\d+', 1, 3),0)) DESC,
    TO_NUMBER(NVL(REGEXP_SUBSTR(version, '\d+', 1, 4),0)) DESC) AS rn
  FROM test)
WHERE rn = 1;

Related Topics
Built-in Functions
Built-in Packages
String Functions
Substring-Instring
Translate-Replace
What's New In 12cR1
What's New In 12cR2

Morgan's Library Page Footer
This site is maintained by Dan Morgan. Last Updated: This site is protected by copyright and trademark laws under U.S. and International law. © 1998-2017 Daniel A. Morgan All Rights Reserved