Regex Quick Reference

Common patterns for emails, URLs, dates, and the lookahead/lookbehind syntax I always forget.

Basics

The dot matches any single character except a newline (unless the s / DOTALL flag is set).

.       # any character except newline
..      # any two characters

# input:  "cat cot cut"
# pattern: c.t
# matches: "cat", "cot", "cut"

Anchors don't consume characters. They assert a position: start of string, end of string, or a word boundary.

^       # start of string (or line with m flag)
$       # end of string (or line with m flag)
\b      # word boundary

# input:  "error: file not found"
# pattern: ^error
# matches: "error" (only at the start)

# input:  "cat concatenate"
# pattern: \bcat\b
# matches: "cat" (not the "cat" inside "concatenate")

Character classes match one character from a set. Ranges use a hyphen; a caret inside the brackets negates the class.

[abc]     # matches "a", "b", or "c"
[a-z]     # any lowercase letter
[A-Z]     # any uppercase letter
[0-9]     # any digit
[a-zA-Z]  # any letter
[^abc]    # any character EXCEPT "a", "b", or "c"
[^0-9]    # any non-digit character

# input:  "a1b2c3"
# pattern: [a-c]
# matches: "a", "b", "c"

Shorthand character classes cover the most common sets. The uppercase version is the negation.

\d      # digit           [0-9]
\D      # non-digit       [^0-9]
\w      # word character   [a-zA-Z0-9_]
\W      # non-word         [^a-zA-Z0-9_]
\s      # whitespace       [ \t\n\r\f\v]
\S      # non-whitespace   [^ \t\n\r\f\v]

# input:  "Order #42 shipped"
# pattern: \d+
# matches: "42"

# input:  "hello world"
# pattern: \S+
# matches: "hello", "world"

The basic quantifiers: * (zero or more), + (one or more), ? (zero or one).

*       # zero or more of the preceding token
+       # one or more of the preceding token
?       # zero or one (makes the preceding token optional)

# input:  "colour color"
# pattern: colou?r
# matches: "colour", "color"

# input:  "ac abc abbc abbbc"
# pattern: ab+c
# matches: "abc", "abbc", "abbbc" (not "ac")

Groups & Alternation

Parentheses create a capturing group. The matched text is saved and can be referenced later by its group number.

(abc)       # capturing group — matches "abc", stores as group 1

# input:  "2024-12-25"
# pattern: (\d{4})-(\d{2})-(\d{2})
# group 1: "2024"
# group 2: "12"
# group 3: "25"

Non-capturing groups let you group tokens for quantifiers or alternation without saving the match.

(?:abc)     # non-capturing group — groups but doesn't store

# input:  "http://example.com https://example.com"
# pattern: (?:https?://)(\S+)
# group 1: "example.com" (only the domain is captured)

The pipe character acts as OR. It has the lowest precedence, so wrap alternatives in a group when needed.

a|b         # matches "a" or "b"
cat|dog     # matches "cat" or "dog"

# input:  "I have a cat and a dog"
# pattern: cat|dog
# matches: "cat", "dog"

# use a group to limit the alternation scope
# input:  "gray grey"
# pattern: gr(a|e)y
# matches: "gray", "grey"

Backreferences refer to a previously captured group within the same pattern. Useful for finding repeated words or matched delimiters.

\1      # refers to group 1
\2      # refers to group 2

# find duplicate words
# input:  "the the quick brown fox"
# pattern: \b(\w+)\s+\1\b
# matches: "the the"

# match matching quotes
# input:  'He said "hello" and \'goodbye\''
# pattern: (["'])(.*?)\1
# matches: "hello", 'goodbye'

Named groups make complex patterns more readable. The syntax varies by engine.

# Python style
(?P<name>...)        # define named group
(?P=name)            # backreference by name

# PCRE / JavaScript style
(?<name>...)         # define named group
\k<name>             # backreference by name

# input:  "2024-12-25"
# pattern: (?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})
# group "year":  "2024"
# group "month": "12"
# group "day":   "25"

Lookahead / Lookbehind

Lookarounds are zero-width assertions. They check what comes before or after the current position without consuming any characters.

Positive lookahead (?=...) asserts that what follows the current position matches the pattern.

(?=...)      # positive lookahead — succeeds if ... matches ahead

# match a number only if followed by "px"
# input:  "width: 100px; margin: 20em"
# pattern: \d+(?=px)
# matches: "100" (not "20" because "20" is followed by "em")

Negative lookahead (?!...) asserts that what follows does NOT match.

(?!...)      # negative lookahead — succeeds if ... does NOT match ahead

# match "cat" only when NOT followed by "fish"
# input:  "cat catfish catalog"
# pattern: cat(?!fish)
# matches: "cat" (standalone), "cat" in "catalog" (not "cat" in "catfish")

# match a word that does not start with "un"
# pattern: \b(?!un)\w+\b
# input:  "happy unhappy done undone"
# matches: "happy", "done"

Positive lookbehind (?<=...) asserts that what precedes the current position matches the pattern.

(?<=...)     # positive lookbehind — succeeds if ... matches behind

# extract the price amount after a dollar sign
# input:  "Price: $49.99 and 30 units"
# pattern: (?<=\$)\d+(\.\d{2})?
# matches: "49.99" (not "30" — no dollar sign before it)

# get the value after "name="
# input:  'name="sidebar" class="wide"'
# pattern: (?<=name=")([^"]+)
# matches: "sidebar"

Negative lookbehind (?<!...) asserts that what precedes does NOT match.

(?<!...)     # negative lookbehind — succeeds if ... does NOT match behind

# match digits NOT preceded by a minus sign
# input:  "score: -5 bonus: 10"
# pattern: (?<!-)\b\d+\b
# matches: "10" (not "5" because it's preceded by "-")

# match ".js" filenames that are NOT ".min.js"
# input:  "app.js app.min.js vendor.js"
# pattern: \w+(?<!\.min)\.js
# matches: "app.js", "vendor.js"

Combine multiple lookarounds to enforce several conditions at the same position. This is the classic technique for password validation.

# require at least one digit AND one uppercase letter
# pattern: (?=.*\d)(?=.*[A-Z]).+

# input:  "hello"     -> no match (no digit, no uppercase)
# input:  "Hello1"    -> match
# input:  "HELLO"     -> no match (no digit)

Quantifiers

Curly braces specify exact repetition counts: exactly n, at least n, or between n and m.

{n}       # exactly n times
{n,}      # n or more times
{n,m}     # between n and m times (inclusive)

# input:  "aaa ab aabb aaabbb"
# pattern: a{2}
# matches: "aa" in "aaa" (first two), "aa" in "aabb", "aa" in "aaabbb"

# input:  "100 1000 10000"
# pattern: \d{4,5}
# matches: "1000", "10000"

Greedy vs lazy. By default, quantifiers are greedy (match as much as possible). Adding ? makes them lazy (match as little as possible).

# greedy (default)
*       # zero or more, greedy
+       # one or more, greedy
?       # zero or one, greedy

# lazy (add ? after the quantifier)
*?      # zero or more, lazy
+?      # one or more, lazy
??      # zero or one, lazy
{n,m}?  # between n and m, lazy

# input:  "<b>bold</b> and <b>more</b>"
# greedy:  <.+>   matches "<b>bold</b> and <b>more</b>" (entire string)
# lazy:    <.+?>  matches "<b>", "</b>", "<b>", "</b>" (each tag)

The greedy vs lazy distinction matters most when matching delimited content like HTML tags, quoted strings, or bracketed text.

# extract individual quoted strings
# input:  'say "hello" and "world"'

# greedy — captures too much
"(.+)"    # matches: "hello" and "world" (one big match)

# lazy — stops at the first closing quote
"(.+?)"   # matches: "hello", "world" (two separate matches)

# alternative: negated character class (often faster)
"([^"]+)" # matches: "hello", "world"

Possessive quantifiers are greedy and never backtrack. They fail fast when a match isn't possible. Supported in Java, PCRE, and some other engines (not standard JavaScript).

# possessive (add + after the quantifier)
*+      # zero or more, possessive
++      # one or more, possessive
?+      # zero or one, possessive
{n,m}+  # between n and m, possessive

# greedy:      \d+\d   on "12345" — \d+ grabs all 5 digits,
#              backtracks to 4, second \d matches "5" -> success
# possessive:  \d++\d  on "12345" — \d++ grabs all 5 digits,
#              refuses to give any back -> fail (no backtracking)

# use possessive quantifiers for performance when you know
# backtracking is unnecessary, e.g., matching structured data:
# pattern: "[^"]*+"
# prevents catastrophic backtracking on malformed input

Summary table of all quantifier flavors.

Greedy      Lazy        Possessive    Meaning
-------     -------     ----------    -------------------------
*           *?          *+            zero or more
+           +?          ++            one or more
?           ??          ?+            zero or one
{n}         {n}?        {n}+          exactly n
{n,}        {n,}?       {n,}+         n or more
{n,m}       {n,m}?      {n,m}+        between n and m

Common Patterns

Email validation (simplified). Covers most real-world addresses. For production, use your language's built-in email validation.

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

# matches:   user@example.com
#            first.last+tag@sub.domain.org
# no match:  @missing-local.com
#            user@.com
#            user@domain

URL matching for http and https URLs with optional port, path, query string, and fragment.

https?://[a-zA-Z0-9.-]+(?::\d{1,5})?(?:/[^\s]*)?

# matches:   https://example.com
#            http://sub.domain.org:8080/path?q=1#section
#            https://api.example.com/v2/users
# no match:  ftp://files.example.com
#            example.com (no protocol)

IPv4 address. Each octet is 0-255. This pattern validates the numeric range properly.

# simple (doesn't validate 0-255 range)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

# strict (validates each octet is 0-255)
\b(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

# matches:   192.168.1.1   0.0.0.0   255.255.255.255
# no match:  256.1.1.1   999.0.0.0   192.168.1

Date format (YYYY-MM-DD). Validates month 01-12 and day 01-31. Does not check calendar correctness (e.g., Feb 30).

\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])

# matches:   2024-01-15   1999-12-31   2000-06-01
# no match:  2024-13-01   2024-00-15   24-1-5

# also accept "/" or "." as separator
\d{4}[-/.](?:0[1-9]|1[0-2])[-/.](?:0[1-9]|[12]\d|3[01])

# matches:   2024/01/15   2024.12.31

Phone numbers. Handles US formats with optional country code, parentheses, spaces, dashes, and dots.

(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}

# matches:   555-123-4567
#            (555) 123-4567
#            +1 555.123.4567
#            1-555-123-4567
#            5551234567
# no match:  555-12-4567   12345

Password strength. Uses multiple lookaheads at the start to enforce several rules simultaneously. Adjust the min length in {8,}.

# at least 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special char
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=]).{8,}$

# breakdown:
# (?=.*[a-z])       at least one lowercase letter
# (?=.*[A-Z])       at least one uppercase letter
# (?=.*\d)          at least one digit
# (?=.*[!@#$...])   at least one special character
# .{8,}$            total length 8 or more

# matches:   P@ssw0rd!   MyS3cure#Pass
# no match:  password   PASSWORD1   Pass1!