What Are Regular Expressions?
Regular expressions are sequences of characters that define search patterns. They are used to find, match, and manipulate text based on patterns rather than exact strings. Almost every programming language supports regex, and they are invaluable for tasks like:
- Validating user input (emails, phone numbers, passwords)
- Finding and replacing text in documents
- Parsing log files and data
- Extracting specific information from text
- Data cleaning and transformation
While the syntax might look cryptic at first, once you understand the building blocks, you will be able to read and write regex patterns confidently.
Basic Pattern Matching
Let us start with the fundamentals. In its simplest form, a regex pattern matches literal characters exactly as written.
Literal Characters
The pattern cat matches the exact text "cat" wherever it appears. It would match in "cat", "category", and "concatenate".
Case Sensitivity
By default, regex is case-sensitive. The pattern Cat would not match "cat" or "CAT". Most regex implementations offer a flag (usually i) to make matching case-insensitive.
Special Characters (Metacharacters)
The real power of regex comes from special characters that have specific meanings:
The Dot (.)
The dot matches any single character except a newline. The pattern c.t matches "cat", "cot", "cut", and even "c9t".
Anchors (^ and $)
^matches the start of a line$matches the end of a line
The pattern ^Hello matches "Hello world" but not "Say Hello". The pattern end$ matches "The end" but not "endless".
Escaping Special Characters
To match a special character literally, escape it with a backslash. To match an actual dot, use \.. For example, file\.txt matches "file.txt" but not "fileatxt".
Character Classes
Character classes let you match one character from a set of characters.
Square Brackets
Characters inside square brackets form a character class. The pattern [aeiou] matches any single vowel. The pattern gr[ae]y matches both "gray" and "grey".
Ranges
Use a hyphen to specify a range:
[a-z]matches any lowercase letter[A-Z]matches any uppercase letter[0-9]matches any digit[a-zA-Z0-9]matches any alphanumeric character
Negation
A caret inside brackets negates the class. The pattern [^0-9] matches any character that is NOT a digit.
Shorthand Character Classes
Common character classes have shorthand notations:
\d- any digit (same as [0-9])\D- any non-digit\w- any word character (letters, digits, underscore)\W- any non-word character\s- any whitespace (space, tab, newline)\S- any non-whitespace
Quantifiers
Quantifiers specify how many times a pattern should match.
Basic Quantifiers
*- zero or more times+- one or more times?- zero or one time (optional)
Examples:
colou?rmatches both "color" and "colour"a+matches "a", "aa", "aaa", etc..*matches any number of any characters
Specific Quantities
Use curly braces for precise control:
{n}- exactly n times{n,}- n or more times{n,m}- between n and m times
The pattern \d{3}-\d{4} matches patterns like "555-1234" (3 digits, hyphen, 4 digits).
Practice These Patterns
Use our Regex Tester to experiment with these patterns in real-time. See matches highlighted instantly as you type.
Open Regex TesterGroups and Alternation
Parentheses for Grouping
Parentheses group parts of a pattern together. This is useful for:
- Applying quantifiers to multiple characters
- Capturing matched text for later use
- Creating alternations
The pattern (ab)+ matches "ab", "abab", "ababab", etc. Without parentheses, ab+ would only repeat the "b".
Alternation (OR)
The pipe character | means "or". The pattern cat|dog matches either "cat" or "dog".
Combined with grouping: (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day matches any day of the week.
Practical Examples
Let us look at some real-world regex patterns:
Email Validation
A basic email pattern:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}This matches: one or more allowed characters, then @, then domain name, then dot, then 2+ letter TLD.
Phone Numbers
US phone number with optional formatting:
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}This matches: (555) 123-4567, 555-123-4567, 555.123.4567, 5551234567, and more.
URLs
Basic URL pattern:
https?:\/\/[^\s]+This matches: http or https, followed by ://, then any non-whitespace characters.
Dates
YYYY-MM-DD format:
\d{4}-\d{2}-\d{2}This matches: 2026-02-05, 1999-12-31, etc.
Quick Reference
. Any character^ Start of line$ End of line\d Digit\w Word character\s Whitespace* 0 or more+ 1 or more? Optional[abc] Character class(a|b) Alternation{n} Exactly n timesCommon Regex Flags
Flags modify how the regex engine interprets your pattern:
g(global) - find all matches, not just the firsti(case-insensitive) - ignore case when matchingm(multiline) - ^ and $ match line boundaries, not just string boundariess(dotall) - dot matches newlines too
In most regex tools, you would write the pattern followed by flags: /pattern/gi
Tips for Learning Regex
- Start simple: Begin with literal matches and add complexity gradually
- Test incrementally: Build your pattern piece by piece, testing each addition
- Use a tester: Visual tools that highlight matches help you understand what your pattern does
- Read patterns aloud: Describe what each part matches to verify your understanding
- Keep a reference handy: You do not need to memorize everything - look things up
- Practice regularly: Like any skill, regex improves with practice
Conclusion
Regular expressions are incredibly powerful once you understand the basics. Start with simple patterns and gradually incorporate more advanced features as you become comfortable. The patterns covered in this guide will handle the majority of common text-matching tasks.
Remember that regex is a tool - sometimes a simple string method is clearer and more maintainable than a complex regex. Use regex when it provides genuine value, and keep your patterns as simple as possible while still accomplishing the task.
The best way to learn is by doing. Start testing patterns with real text and see how different expressions work. Before long, you will be writing regex patterns with confidence.