Regular Expressions in Python — Deep Dive (41/100 Days of Python)

4 min readFeb 11, 2023

Day 41 of the “100 Days of Python” blog post series covering regular expressions

Regular expressions (regex) are a powerful tool for data processing and analysis. They allow you to search, match, and extract patterns in text, making them a valuable addition to any Python developer’s toolkit. In this comprehensive guide, we’ll cover the basics of regular expressions in Python, including metacharacters, special sequences, and quantifiers, along with real-world examples to help you understand how to apply them in practice.

What are Regular Expressions in Python?

Regular expressions are a sequence of characters that define a search pattern. They are often used to perform operations on strings, such as searching for specific patterns, replacing substrings, and validating data. Python provides a module called re that provides functions for working with regular expressions.

Using Metacharacters in Python

Metacharacters are special characters that have a special meaning in regular expressions. They are used to match specific patterns in text. Some of the most commonly used metacharacters in Python include:

.: Matches any character except a newline
^: Matches the start of a string
$: Matches the end of a string
[]: Matches any character within the square brackets
[^ ]: Matches any character not within the square brackets
\w: Matches any word character (alphanumeric)
\d: Matches any decimal digit
\s: Matches any whitespace character
\b: Matches a word boundary
( ): Matches the expression within the parentheses
|: Matches either the expression before or after the | symbol

Here’s an example of how metacharacters can be used in real-world applications:

text = 'Hello 123 World 456 Hello World'

# match any character except a newline
if re.search('. World', text):
    print('Match found!')
else:
    print('No match found.')

# match the start of a string
if re.search('^Hello', text):
    print('Match found!')
else:
    print('No match found.')

# match the end of a string
if re.search('Hello World$', text):
    print('Match found!')
else:
    print('No match found.')

# match any character within the square brackets
if re.search('[0123456789]', text):
    print('Match found!')
else:
    print('No match found.')

# match any character not within the square brackets
if re.search('[^0123456789]', text):
    print('Match found!')
else:
    print('No match found.')

# match any word character (alphanumeric)
if re.search('\w+', text):
    print('Match found!')
else:
    print('No match found.')

# match any decimal digit
if re.search('\d+', text):
    print('Match found!')
else:
    print('No match found.')

# match any whitespace character
if re.search('\s+', text):
    print('Match found!')
else:
    print('No match found.')

# match a word boundary
if re.search(r'\bHello\b', text):
    print('Match found!')
else:
    print('No match found.')


# match either the expression before or after the | symbol
if re.search('Hello World|Hello 123', text):
    print('Match found!')
else:
    print('No match found.')

As you can see, the metacharacters in Python provide a powerful way to search, match, and extract patterns in text. In all of the examples above the program should perform a search and find a match. So, the output for each of the snippets should be Match found!.

Special Sequences in Python

Special sequences in Python are sequences of characters that have a special meaning in regular expressions. Some of the most commonly used special sequences include:

\A: — Matches the start of the string
\b: Matches a word boundary
\B: Matches a non-word boundary
\d: Matches any decimal digit
\D: Matches any non-digit character
\s: Matches any whitespace character
\S: Matches any non-whitespace character
\w: Matches any word character (alphanumeric)
\W: Matches any non-word character

Here’s an example of how special sequences can be used:

text = 'Hello 123 World 456 Hello World'

# match the start of the string
if re.search(r'\AHello', text):
    print('Match found!')
else:
    print('No match found.')

# match a word boundary
if re.search(r'\bHello\b', text):
    print('Match found!')
else:
    print('No match found.')

# match a non-word boundary
if re.search(r'\BHello\B', text):
    print('Match found!')
else:
    print('No match found.')

# match any decimal digit
if re.search(r'\d+', text):
    print('Match found!')
else:
    print('No match found.')

# match any non-digit character
if re.search(r'\D+', text):
    print('Match found!')
else:
    print('No match found.')

# match any whitespace character
if re.search(r'\s+', text):
    print('Match found!')
else:
    print('No match found.')

# match any non-whitespace character
if re.search(r'\S+', text):
    print('Match found!')
else:
    print('No match found.')

# match any word character (alphanumeric)
if re.search(r'\w+', text):
    print('Match found!')
else:
    print('No match found.')

# match any non-word character
if re.search(r'\W+', text):
    print('Match found!')
else:
    print('No match found.')

Quantifiers in Python

Quantifiers in Python allow you to specify the number of times a character or pattern should be matched. Some of the most commonly used quantifiers include:

*: Matches zero or more occurrences of the preceding character or pattern:

text = 'Hello 123 World 456 Hello World'

# match zero or more occurrences of the preceding character
if re.search(r'Hello *World', text):
    print('Match found!')
else:
    print('No match found.')

+: Matches one or more occurrences of the preceding character or pattern:

text = 'Hello 123 World 456 Hello World'

# match one or more occurrences of the preceding character
if re.search(r'Hello +World', text):
    print('Match found!')
else:
    print('No match found.')

?: Matches zero or one occurrence of the preceding character or pattern:

text = 'Hello 123 World 456 Hello World'

# match zero or one occurrence of the preceding character
if re.search(r'Hello? World', text):
    print('Match found!')
else:
    print('No match found.')

{m,n}: Matches from m to n occurrences of the preceding character or pattern:

text = 'Hello 123 World 456 Hello World'

# match from m to n occurrences of the preceding character
if re.search(r'Hello{1,3} World', text):
    print('Match found!')
else:
    print('No match found.')

What’s next?

If you found this story valuable, please consider clapping multiple times (this really helps a lot!)
Hands-on Practice: Free Python Course
Full series: 100 Days of Python
Previous topic: What is Stack Overflow Really?
Next topic: Regular Expressions — Grouping and Backreferences

Regular Expressions in Python — Deep Dive (41/100 Days of Python)

What are Regular Expressions in Python?

Using Metacharacters in Python

Special Sequences in Python

Quantifiers in Python

What’s next?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Martin Mirakyan

No responses yet