Common patterns for emails, URLs, dates, and the lookahead/lookbehind syntax I always forget.
The dot matches any single character except a newline (unless the s / DOTALL flag is set).
. # any character except newline
.. # any two characters
# input: "cat cot cut"
# pattern: c.t
# matches: "cat", "cot", "cut"
Anchors don't consume characters. They assert a position: start of string, end of string, or a word boundary.
^ # start of string (or line with m flag)
$ # end of string (or line with m flag)
\b # word boundary
# input: "error: file not found"
# pattern: ^error
# matches: "error" (only at the start)
# input: "cat concatenate"
# pattern: \bcat\b
# matches: "cat" (not the "cat" inside "concatenate")
Character classes match one character from a set. Ranges use a hyphen; a caret inside the brackets negates the class.
[abc] # matches "a", "b", or "c"
[a-z] # any lowercase letter
[A-Z] # any uppercase letter
[0-9] # any digit
[a-zA-Z] # any letter
[^abc] # any character EXCEPT "a", "b", or "c"
[^0-9] # any non-digit character
# input: "a1b2c3"
# pattern: [a-c]
# matches: "a", "b", "c"
Shorthand character classes cover the most common sets. The uppercase version is the negation.
\d # digit [0-9]
\D # non-digit [^0-9]
\w # word character [a-zA-Z0-9_]
\W # non-word [^a-zA-Z0-9_]
\s # whitespace [ \t\n\r\f\v]
\S # non-whitespace [^ \t\n\r\f\v]
# input: "Order #42 shipped"
# pattern: \d+
# matches: "42"
# input: "hello world"
# pattern: \S+
# matches: "hello", "world"
The basic quantifiers: * (zero or more), + (one or more), ? (zero or one).
* # zero or more of the preceding token
+ # one or more of the preceding token
? # zero or one (makes the preceding token optional)
# input: "colour color"
# pattern: colou?r
# matches: "colour", "color"
# input: "ac abc abbc abbbc"
# pattern: ab+c
# matches: "abc", "abbc", "abbbc" (not "ac")
Parentheses create a capturing group. The matched text is saved and can be referenced later by its group number.
(abc) # capturing group — matches "abc", stores as group 1
# input: "2024-12-25"
# pattern: (\d{4})-(\d{2})-(\d{2})
# group 1: "2024"
# group 2: "12"
# group 3: "25"
Non-capturing groups let you group tokens for quantifiers or alternation without saving the match.
(?:abc) # non-capturing group — groups but doesn't store
# input: "http://example.com https://example.com"
# pattern: (?:https?://)(\S+)
# group 1: "example.com" (only the domain is captured)
The pipe character acts as OR. It has the lowest precedence, so wrap alternatives in a group when needed.
a|b # matches "a" or "b"
cat|dog # matches "cat" or "dog"
# input: "I have a cat and a dog"
# pattern: cat|dog
# matches: "cat", "dog"
# use a group to limit the alternation scope
# input: "gray grey"
# pattern: gr(a|e)y
# matches: "gray", "grey"
Backreferences refer to a previously captured group within the same pattern. Useful for finding repeated words or matched delimiters.
\1 # refers to group 1
\2 # refers to group 2
# find duplicate words
# input: "the the quick brown fox"
# pattern: \b(\w+)\s+\1\b
# matches: "the the"
# match matching quotes
# input: 'He said "hello" and \'goodbye\''
# pattern: (["'])(.*?)\1
# matches: "hello", 'goodbye'
Named groups make complex patterns more readable. The syntax varies by engine.
# Python style
(?P<name>...) # define named group
(?P=name) # backreference by name
# PCRE / JavaScript style
(?<name>...) # define named group
\k<name> # backreference by name
# input: "2024-12-25"
# pattern: (?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})
# group "year": "2024"
# group "month": "12"
# group "day": "25"
Lookarounds are zero-width assertions. They check what comes before or after the current position without consuming any characters.
Positive lookahead (?=...) asserts that what follows the current position matches the pattern.
(?=...) # positive lookahead — succeeds if ... matches ahead
# match a number only if followed by "px"
# input: "width: 100px; margin: 20em"
# pattern: \d+(?=px)
# matches: "100" (not "20" because "20" is followed by "em")
Negative lookahead (?!...) asserts that what follows does NOT match.
(?!...) # negative lookahead — succeeds if ... does NOT match ahead
# match "cat" only when NOT followed by "fish"
# input: "cat catfish catalog"
# pattern: cat(?!fish)
# matches: "cat" (standalone), "cat" in "catalog" (not "cat" in "catfish")
# match a word that does not start with "un"
# pattern: \b(?!un)\w+\b
# input: "happy unhappy done undone"
# matches: "happy", "done"
Positive lookbehind (?<=...) asserts that what precedes the current position matches the pattern.
(?<=...) # positive lookbehind — succeeds if ... matches behind
# extract the price amount after a dollar sign
# input: "Price: $49.99 and 30 units"
# pattern: (?<=\$)\d+(\.\d{2})?
# matches: "49.99" (not "30" — no dollar sign before it)
# get the value after "name="
# input: 'name="sidebar" class="wide"'
# pattern: (?<=name=")([^"]+)
# matches: "sidebar"
Negative lookbehind (?<!...) asserts that what precedes does NOT match.
(?<!...) # negative lookbehind — succeeds if ... does NOT match behind
# match digits NOT preceded by a minus sign
# input: "score: -5 bonus: 10"
# pattern: (?<!-)\b\d+\b
# matches: "10" (not "5" because it's preceded by "-")
# match ".js" filenames that are NOT ".min.js"
# input: "app.js app.min.js vendor.js"
# pattern: \w+(?<!\.min)\.js
# matches: "app.js", "vendor.js"
Combine multiple lookarounds to enforce several conditions at the same position. This is the classic technique for password validation.
# require at least one digit AND one uppercase letter
# pattern: (?=.*\d)(?=.*[A-Z]).+
# input: "hello" -> no match (no digit, no uppercase)
# input: "Hello1" -> match
# input: "HELLO" -> no match (no digit)
Curly braces specify exact repetition counts: exactly n, at least n, or between n and m.
{n} # exactly n times
{n,} # n or more times
{n,m} # between n and m times (inclusive)
# input: "aaa ab aabb aaabbb"
# pattern: a{2}
# matches: "aa" in "aaa" (first two), "aa" in "aabb", "aa" in "aaabbb"
# input: "100 1000 10000"
# pattern: \d{4,5}
# matches: "1000", "10000"
Greedy vs lazy. By default, quantifiers are greedy (match as much as possible). Adding ? makes them lazy (match as little as possible).
# greedy (default)
* # zero or more, greedy
+ # one or more, greedy
? # zero or one, greedy
# lazy (add ? after the quantifier)
*? # zero or more, lazy
+? # one or more, lazy
?? # zero or one, lazy
{n,m}? # between n and m, lazy
# input: "<b>bold</b> and <b>more</b>"
# greedy: <.+> matches "<b>bold</b> and <b>more</b>" (entire string)
# lazy: <.+?> matches "<b>", "</b>", "<b>", "</b>" (each tag)
The greedy vs lazy distinction matters most when matching delimited content like HTML tags, quoted strings, or bracketed text.
# extract individual quoted strings
# input: 'say "hello" and "world"'
# greedy — captures too much
"(.+)" # matches: "hello" and "world" (one big match)
# lazy — stops at the first closing quote
"(.+?)" # matches: "hello", "world" (two separate matches)
# alternative: negated character class (often faster)
"([^"]+)" # matches: "hello", "world"
Possessive quantifiers are greedy and never backtrack. They fail fast when a match isn't possible. Supported in Java, PCRE, and some other engines (not standard JavaScript).
# possessive (add + after the quantifier)
*+ # zero or more, possessive
++ # one or more, possessive
?+ # zero or one, possessive
{n,m}+ # between n and m, possessive
# greedy: \d+\d on "12345" — \d+ grabs all 5 digits,
# backtracks to 4, second \d matches "5" -> success
# possessive: \d++\d on "12345" — \d++ grabs all 5 digits,
# refuses to give any back -> fail (no backtracking)
# use possessive quantifiers for performance when you know
# backtracking is unnecessary, e.g., matching structured data:
# pattern: "[^"]*+"
# prevents catastrophic backtracking on malformed input
Summary table of all quantifier flavors.
Greedy Lazy Possessive Meaning
------- ------- ---------- -------------------------
* *? *+ zero or more
+ +? ++ one or more
? ?? ?+ zero or one
{n} {n}? {n}+ exactly n
{n,} {n,}? {n,}+ n or more
{n,m} {n,m}? {n,m}+ between n and m
Email validation (simplified). Covers most real-world addresses. For production, use your language's built-in email validation.
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
# matches: user@example.com
# first.last+tag@sub.domain.org
# no match: @missing-local.com
# user@.com
# user@domain
URL matching for http and https URLs with optional port, path, query string, and fragment.
https?://[a-zA-Z0-9.-]+(?::\d{1,5})?(?:/[^\s]*)?
# matches: https://example.com
# http://sub.domain.org:8080/path?q=1#section
# https://api.example.com/v2/users
# no match: ftp://files.example.com
# example.com (no protocol)
IPv4 address. Each octet is 0-255. This pattern validates the numeric range properly.
# simple (doesn't validate 0-255 range)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
# strict (validates each octet is 0-255)
\b(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
# matches: 192.168.1.1 0.0.0.0 255.255.255.255
# no match: 256.1.1.1 999.0.0.0 192.168.1
Date format (YYYY-MM-DD). Validates month 01-12 and day 01-31. Does not check calendar correctness (e.g., Feb 30).
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
# matches: 2024-01-15 1999-12-31 2000-06-01
# no match: 2024-13-01 2024-00-15 24-1-5
# also accept "/" or "." as separator
\d{4}[-/.](?:0[1-9]|1[0-2])[-/.](?:0[1-9]|[12]\d|3[01])
# matches: 2024/01/15 2024.12.31
Phone numbers. Handles US formats with optional country code, parentheses, spaces, dashes, and dots.
(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}
# matches: 555-123-4567
# (555) 123-4567
# +1 555.123.4567
# 1-555-123-4567
# 5551234567
# no match: 555-12-4567 12345
Password strength. Uses multiple lookaheads at the start to enforce several rules simultaneously. Adjust the min length in {8,}.
# at least 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special char
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=]).{8,}$
# breakdown:
# (?=.*[a-z]) at least one lowercase letter
# (?=.*[A-Z]) at least one uppercase letter
# (?=.*\d) at least one digit
# (?=.*[!@#$...]) at least one special character
# .{8,}$ total length 8 or more
# matches: P@ssw0rd! MyS3cure#Pass
# no match: password PASSWORD1 Pass1!