How to Write and Test Regular Expressions
A regular expression (regex) is a pattern that matches text. It is used to validate input (email, phone, password), search and replace text, extract data from logs, and parse structured content. The same regex syntax works across Python, JavaScript, Java, and most other languages with minor differences. Once you understand the core building blocks, you can construct complex patterns from simple parts.
Last updated: March 31, 2026
The Formula
Anchors: ^ (start), $ (end)
Character classes: [abc] matches a, b, or c | [a-z] range | \d digit | \w word char | \s whitespace
Quantifiers: * (0+), + (1+), ? (0 or 1), {n} exactly n, {n,m} between n and m
Groups: (abc) capturing group | (?:abc) non-capturing | | alternation
Lookaheads: (?=...) positive | (?!...) negativeVariable Definitions
| Symbol | Name | Description |
|---|---|---|
| . | Wildcard | Matches any single character except newline |
| \d, \w, \s | Shorthand Classes | \d = [0-9], \w = [a-zA-Z0-9_], \s = [ \t\n\r] |
| ^, $ | Anchors | ^ matches start of string/line, $ matches end |
Step-by-Step Example
Write a regex to validate a basic email address like user@example.com.
Given
Solution
- 1Match local part (letters, digits, dots, underscores):
[a-zA-Z0-9._%+-]+ - 2Match the @ symbol literally:
@ - 3Match domain name:
[a-zA-Z0-9.-]+ - 4Match a dot before TLD:
\. - 5Match TLD (2-6 letters):
[a-zA-Z]{2,6} - 6Add anchors for full-string match:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}$
Pattern: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}$/i — matches user@example.com ✓, rejects user@, @domain.com ✗
Ready to calculate?
Use the free Regex Tester — instant results, no sign-up.
Common Mistakes to Avoid
Forgetting to escape the dot (.) — unescaped . matches any character. Use \. when you mean a literal dot.
Using greedy quantifiers when lazy is needed — .* is greedy and matches as much as possible. .*? is lazy and stops at the first match.
Anchoring with ^ and $ but enabling multiline flag unintentionally — in multiline mode, ^ and $ match line boundaries, not the whole string.
Not escaping special characters in strings — in many languages, the backslash in \d needs to be \\d in a string literal.