Regular Expressions in Python
This article explains regular expressions in Python.
We will cover a wide range of topics, from the basic usage of the re module to complex regular expression pattern matching.
YouTube Video
Regular Expressions in Python
The re module is used for searching and manipulating strings using regular expressions.
What is the re module?
The re module is included in Python's standard library and provides functionality for string manipulation using regular expressions. Regular expressions are used to efficiently find, extract, or replace specific string patterns.
Basic Regular Expression Patterns
Regular expressions define patterns using special symbols. Below are some basic patterns.
.: Any single character^: Start of a string$: End of a string\d: Any digit (0-9)\w: Any word character (a-z, A-Z, 0-9, _)\s: Any whitespace character*: Zero or more repetitions+: One or more repetitions[]: Character class (e.g.,[a-z]matches lowercase alphabets)
1import re
2
3pattern = r"\d{3}-\d{4}"
4text = "My postal code is 123-4567."
5result = re.search(pattern, text)
6if result:
7 print(result.group()) # Output: 123-4567
8
9# Define a pattern using basic regular expression symbols
10pattern = r"^\w+\s\d+\s[a-zA-Z]+$"
11
12# Example text to test the pattern
13text = "Room 23 Tokyo"
14
15# Check if the pattern matches the text
16result = re.match(pattern, text)
17if result:
18 print("Matched:", result.group())
19else:
20 print("No match")- This code first checks whether the string matches a postal code pattern. Next, it checks whether the entire string (from start to end) matches a pattern composed of a word, whitespace, digits, whitespace, and an English word. This helps you understand how the basic elements of regular expressions can be combined.
How to Use Matching Functions
re.match()
re.match() checks if the start of the string matches the specified pattern.
1import re
2
3pattern = r"\w+"
4text = "Python is powerful"
5match = re.match(pattern, text)
6if match:
7 print(match.group()) # Output: Python- This code checks whether the string starts with a word character (alphanumeric or underscore). The first word 'Python' matches the pattern and is output.
re.search()
re.search() scans the whole string and returns the first match.
1import re
2
3pattern = r"powerful"
4text = "Python is powerful and versatile"
5search = re.search(pattern, text)
6if search:
7 print(search.group()) # Output: powerful- This code searches the entire string for the word 'powerful' and returns the first match. As a result,
re.search()outputs the matched string 'powerful'.
re.findall()
re.findall() returns all matches of the pattern as a list.
1import re
2
3pattern = r"\b\w{6}\b"
4text = "Python is powerful and versatile"
5matches = re.findall(pattern, text)
6print(matches) # Output: ['Python', 'strong']- This code finds all words that are exactly six characters long and returns them as a list. 'Python' in the string matches the condition, and the list
['Python']is output.
re.finditer()
re.finditer() returns all matches as an iterator, allowing you to retrieve detailed information for each match.
1import re
2
3pattern = r"\b\w{6}\b"
4text = "Python is powerful and versatile"
5matches = re.finditer(pattern, text)
6for match in matches:
7 print(match.group()) # Output: Python- This code searches for all six-letter words sequentially and processes each match via an iterator. 'Python' matches the pattern here and is output.
Replacement and Splitting
re.sub()
re.sub() replaces parts of the string that match the regular expression with another string.
1import re
2
3pattern = r"\d+"
4text = "There are 100 apples"
5new_text = re.sub(pattern, "many", text)
6print(new_text) # Output: There are many apples- This code replaces all digits in the string with 'many'.
re.split()
re.split() splits the string at parts that match the regular expression.
1import re
2
3pattern = r"\s+"
4text = "Python is powerful"
5parts = re.split(pattern, text)
6print(parts) # Output: ['Python', 'is', 'powerful']- This code splits the string on one or more whitespace characters. As a result, the string is split into words, yielding
['Python', 'is', 'powerful'].
Groups and Captures
Using regular expression grouping makes it easy to extract matched substrings. Enclosing in parentheses () captures it as a group.
1import re
2
3pattern = r"(\d{3})-(\d{4})"
4text = "My postal code is 123-4567."
5match = re.search(pattern, text)
6if match:
7 print(match.group(1)) # Output: 123
8 print(match.group(2)) # Output: 4567- This code extracts the 3-digit and 4-digit numbers as separate groups from a postal code in the form '123-4567'.
group(1)returns the first three digits '123', andgroup(2)returns the last four digits '4567'.
Named groups
Using named groups lets you retrieve values by meaningful names instead of relying on indices. Below is a concrete example that extracts the date and level from a log.
1import re
2
3log = "2025-10-25 14:00:01 [ERROR] Something failed"
4
5pattern = r"(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) \[(?P<level>[A-Z]+)\] (?P<msg>.*)"
6m = re.search(pattern, log)
7if m:
8 print(m.group("date"), m.group("time"), m.group("level"))
9 print("message:", m.group("msg"))- This code uses named groups to extract the date, time, level, and message from a log string. It retrieves the values using meaningful names rather than indices.
Using Flag Options
The re module has several flag options to control search behavior.
re.IGNORECASE(re.I): A flag option that makes the matching case-insensitive.re.MULTILINE(re.M): A flag option that enables matching across multiple lines.re.DOTALL(re.S): A flag option where a dot.matches newline characters as well.
1import re
2
3pattern = r"python"
4text = "Python is powerful"
5match = re.search(pattern, text, re.IGNORECASE)
6if match:
7 print(match.group()) # Output: Python- This code searches for the word 'python' case-insensitively. By using the
re.IGNORECASEflag, it also matches 'Python'.
Advantages of regular expression objects (re.compile)
Compiling a pattern with re.compile makes it more efficient to reuse and improves readability. You can also set flags here.
The following example repeatedly searches for matches across multiple lines using a compiled pattern. If you use the same pattern frequently, compiling it with re.compile improves performance.
1import re
2
3pattern = re.compile(r"User: (\w+), ID: (\d+)")
4text = "User: alice, ID: 1\nUser: bob, ID: 2"
5
6for m in pattern.finditer(text):
7 print(m.groups())- This code uses a compiled regular expression to extract usernames and IDs from multiple lines.
re.compileimproves efficiency when reusing the same pattern and also makes the code more readable.
Applications
For example, consider a script that extracts all email addresses from a text file.
1import re
2
3pattern = r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
4text = "Contact us at info@example.com or support@service.com."
5emails = re.findall(pattern, text)
6print(emails) # Output: ['info@example.com', 'support@service.com']- This code finds all email addresses in the string and extracts them as a list.
Conclusion
The re module is a powerful tool for string manipulation in Python. Here, we covered a wide range from basic usage to advanced regular expressions using grouping and flag options. Python's regular expressions are very powerful and an indispensable tool for text processing.
You can follow along with the above article using Visual Studio Code on our YouTube channel. Please also check out the YouTube channel.