Regular Expressions in Python — Deep Dive (41/100 Days of Python)
Regular expressions (regex) are a powerful tool for data processing and analysis. They allow you to search, match, and extract patterns in text, making them a valuable addition to any Python developer’s toolkit. In this comprehensive guide, we’ll cover the basics of regular expressions in Python, including metacharacters, special sequences, and quantifiers, along with real-world examples to help you understand how to apply them in practice.
What are Regular Expressions in Python?
Regular expressions are a sequence of characters that define a search pattern. They are often used to perform operations on strings, such as searching for specific patterns, replacing substrings, and validating data. Python provides a module called re
that provides functions for working with regular expressions.
Using Metacharacters in Python
Metacharacters are special characters that have a special meaning in regular expressions. They are used to match specific patterns in text. Some of the most commonly used metacharacters in Python include:
.
: Matches any character except a newline^
: Matches the start of a string$
: Matches the end of a string[]
: Matches any character within the square brackets[^ ]
: Matches any character not within the square brackets\w
: Matches any word character (alphanumeric)\d
: Matches any decimal digit\s
: Matches any whitespace character\b
: Matches a word boundary( )
: Matches the expression within the parentheses|
: Matches either the expression before or after the|
symbol
Here’s an example of how metacharacters can be used in real-world applications:
text = 'Hello 123 World 456 Hello World'
# match any character except a newline
if re.search('. World', text):
print('Match found!')
else:
print('No match found.')
# match the start of a string
if re.search('^Hello', text):
print('Match found!')
else:
print('No match found.')
# match the end of a string
if re.search('Hello World$', text):
print('Match found!')
else:
print('No match found.')
# match any character within the square brackets
if re.search('[0123456789]', text):
print('Match found!')
else:
print('No match found.')
# match any character not within the square brackets
if re.search('[^0123456789]', text):
print('Match found!')
else:
print('No match found.')
# match any word character (alphanumeric)
if re.search('\w+', text):
print('Match found!')
else:
print('No match found.')
# match any decimal digit
if re.search('\d+', text):
print('Match found!')
else:
print('No match found.')
# match any whitespace character
if re.search('\s+', text):
print('Match found!')
else:
print('No match found.')
# match a word boundary
if re.search(r'\bHello\b', text):
print('Match found!')
else:
print('No match found.')
# match either the expression before or after the | symbol
if re.search('Hello World|Hello 123', text):
print('Match found!')
else:
print('No match found.')
As you can see, the metacharacters in Python provide a powerful way to search, match, and extract patterns in text. In all of the examples above the program should perform a search and find a match. So, the output for each of the snippets should be Match found!
.
Special Sequences in Python
Special sequences in Python are sequences of characters that have a special meaning in regular expressions. Some of the most commonly used special sequences include:
\A
: — Matches the start of the string\b
: Matches a word boundary\B
: Matches a non-word boundary\d
: Matches any decimal digit\D
: Matches any non-digit character\s
: Matches any whitespace character\S
: Matches any non-whitespace character\w
: Matches any word character (alphanumeric)\W
: Matches any non-word character
Here’s an example of how special sequences can be used:
text = 'Hello 123 World 456 Hello World'
# match the start of the string
if re.search(r'\AHello', text):
print('Match found!')
else:
print('No match found.')
# match a word boundary
if re.search(r'\bHello\b', text):
print('Match found!')
else:
print('No match found.')
# match a non-word boundary
if re.search(r'\BHello\B', text):
print('Match found!')
else:
print('No match found.')
# match any decimal digit
if re.search(r'\d+', text):
print('Match found!')
else:
print('No match found.')
# match any non-digit character
if re.search(r'\D+', text):
print('Match found!')
else:
print('No match found.')
# match any whitespace character
if re.search(r'\s+', text):
print('Match found!')
else:
print('No match found.')
# match any non-whitespace character
if re.search(r'\S+', text):
print('Match found!')
else:
print('No match found.')
# match any word character (alphanumeric)
if re.search(r'\w+', text):
print('Match found!')
else:
print('No match found.')
# match any non-word character
if re.search(r'\W+', text):
print('Match found!')
else:
print('No match found.')
Quantifiers in Python
Quantifiers in Python allow you to specify the number of times a character or pattern should be matched. Some of the most commonly used quantifiers include:
*
: Matches zero or more occurrences of the preceding character or pattern:
text = 'Hello 123 World 456 Hello World'
# match zero or more occurrences of the preceding character
if re.search(r'Hello *World', text):
print('Match found!')
else:
print('No match found.')
+
: Matches one or more occurrences of the preceding character or pattern:
text = 'Hello 123 World 456 Hello World'
# match one or more occurrences of the preceding character
if re.search(r'Hello +World', text):
print('Match found!')
else:
print('No match found.')
?
: Matches zero or one occurrence of the preceding character or pattern:
text = 'Hello 123 World 456 Hello World'
# match zero or one occurrence of the preceding character
if re.search(r'Hello? World', text):
print('Match found!')
else:
print('No match found.')
{m,n}
: Matches from m
to n
occurrences of the preceding character or pattern:
text = 'Hello 123 World 456 Hello World'
# match from m to n occurrences of the preceding character
if re.search(r'Hello{1,3} World', text):
print('Match found!')
else:
print('No match found.')
What’s next?
- If you found this story valuable, please consider clapping multiple times (this really helps a lot!)
- Hands-on Practice: Free Python Course
- Full series: 100 Days of Python
- Previous topic: What is Stack Overflow Really?
- Next topic: Regular Expressions — Grouping and Backreferences