19 Python Regular Expression Exercises and Solutions

python regular expression exercises and solutions

Regular expressions are a way of describing patterns of text using special symbols and characters. They can be very useful especially in NLP (where you are dealing with text data) for searching, replacing, or extracting information from text. In this article, I will explore some exercises and solutions with answers to enhance your learning of regular expression in Python.

In this article, I will share some Python practice exercises and solutions for regular expression with their answer code, first try to solve those problems by yourself. If you face any difficulty solving those then only see the solution.

Python has a built-in module called re that provides support for regular expressions. You can import it using import re at the beginning of your code. It has various functions that allow you to work with regular expressions for searching, matching, and manipulating strings.

I will start by exploring each popular function of the re module in Python. After that, I will share some advanced regular expression exercises. This way, everyone, from beginners to those at an intermediate level, can easily understand and follow this article. Okay let’s get started.

In this exercise, we will use re.search() function to find the first occurrence of a pattern (in this case digit) in a given string. It prints the matched content if found, otherwise, it prints a message indicating no match.

import re

pattern = r'\d+'
text = 'There are 123 apples and 23 bannanas.'
match = re.search(pattern, text)

if match:
    print(f'Match found: {match.group()}')
else:
    print('No match found.')
Output:
Match found: 123

Exercise 2: Using re.match()

This exercise illustrates the application of re.match() function to check if the regular expression pattern matches (the word “There”) at the beginning of a string. It prints the matched content if found at the beginning, otherwise, it prints a message indicating no match.

import re

pattern = r'There'
text = 'There are 123 apples and 23 bannanas.'
match = re.match(pattern, text)

if match:
    print(f'Match found at the beginning: {match.group()}')
else:
    print('No match found at the beginning.')
Output:
Match found at the beginning: There

Exercise 3: Using re.findall()

In this exercise, we will use re.findall() function to find all occurrences (of digits) of a pattern in a given string. It prints all the matched content if found, otherwise, it prints a message indicating no matches.

import re

pattern = r'\d+'
text = 'There are 123 apples and 456 oranges.'
matches = re.findall(pattern, text)

if matches:
    print(f'All matches: {matches}')
else:
    print('No matches found.')
Output:
All matches: ['123', '456']

Exercise 4: Using re.finditer()

Similar to re.findall(), we can also use re.finditer() function to find all occurrences of a pattern in a given string and iterate over the match objects. It will print each matched content (in our case digits or numbers) if found.

import re

pattern = r'\d+'
text = 'There are 123 apples and 456 oranges.'
matches = re.finditer(pattern, text)

for match in matches:
    print(f'Match found: {match.group()}')
Output:
Match found: 123
Match found: 456

Exercise 5: Using re.sub()

If you are working on document redaction re.sub() can be a handy function for you. It will help you to replace occurrences of a pattern in a given string with a specified replacement. It prints the resulting string after replacement. Below is the Python code to do that.

import re

pattern = r'\d+'
text = 'There are 123 apples and 456 oranges.'
replaced_text = re.sub(pattern, 'XXXX', text)

print(f'Replaced text: {replaced_text}')
Output:
Replaced text: There are XXXX apples and XXXX oranges.

Exercise 6: Using re.split()

If you are working on NLP projects like sentiment analysis , word embedding, document similarity matching, etc. we often need to split a document based on some logic. This exercise illustrates the application of re.split() to split a string at occurrences of a pattern and obtain a list of substrings.

import re

pattern = r'\s+'
text = 'Split this string.'
parts = re.split(pattern, text)

print(f'Split parts: {parts}')
Output:
Split parts: ['Split', 'this', 'string.']

Here I am closing the basic regular expression exercises and solutions for beginners. Let’s now move on to some advanced and practical regex exercises that you may face in your interviews.

Exercise 7: Matching a Simple Pattern

Write a Python program to validate and extract Social Security Numbers (SSN) from a given text.

# 1 - Python regular expression exercises and Solutions with Answers 
import re

# Given Text
text = 'Employee ID: 123-45-6789 and Employee ID: 247-36-6788'

# Regex Pattern
pattern = r'\d{3}-\d{2}-\d{4}'

match = re.search(pattern, text)
print(match.group() if match else 'No match')
Output
123-45-6789

In this code, we use the regular expression \d{3}-\d{2}-\d{4} to match a pattern to find a social security number. This pattern looks for three digits, followed by a hyphen, then two digits, another hyphen, and finally four digits. The re.search() function is used to find the first occurrence of the pattern in the given text. The group() method returns the matched portion, or ‘No match’ is printed if there is no match.

Also Read:  15 Python Tuple Exercises for Beginners

Note: As you know re.search() will only return first match. So if you need to extract all matched numbers you can use re.finditer() or re.findall() method.

Exercise 8: Extracting Email Addresses

Create a Python program that extracts and validates email addresses from a given text.

# 2 - Python regular expression practice exercises and Solutions with Answers
import re

# Given Text
text = 'Contact us at support@example.com or info@company.com'

pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

matches = re.findall(pattern, text)
print(matches)
Output
['support@example.com', 'info@company.com']

This practice Python code extracts email addresses from the given text using the regular expression \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b. Let’s break down its components:

  1. \b: Word boundary anchor. It ensures that the email address is not part of a larger word.
  2. [A-Za-z0-9._%+-]+: This part matches the username of the email address. It allows one or more characters, including letters (both uppercase and lowercase), digits, and certain special characters like period (.), underscore (_), percent (%), plus (+), and hyphen (-).
  3. @: Matches the at symbol, which separates the username from the domain.
  4. [A-Za-z0-9.-]+: This part matches the domain name. It allows one or more characters, including letters (both uppercase and lowercase), digits, period (.), and hyphen (-). It ensures that the domain name does not start or end with a period or hyphen.
  5. \.: This matches the dot (.) in the email address, separating the domain name from the top-level domain (TLD).
  6. [A-Z|a-z]{2,}: This part matches the TLD (like com, org, net). It requires at least two consecutive uppercase or lowercase letters.
  7. \b: Another word boundary anchor, ensuring that the TLD is not part of a larger word.

We used re.findall() function to return a list of all matches found in the text, and it prints the extracted email addresses.

Exercise 9: Replacing Text

In this regular expression exercises, write a Python code to replace a word from a given text.

# 3 - Python regular expression assignment exercises and Solutions with Answers
import re

pattern = r'\bapple\b'
text = 'I have an apple, and I love apples.'

replaced_text = re.sub(pattern, 'orange', text)
print(replaced_text)
Output:
I have an orange, and I love apples.

In this code, the regular expression \bapple\b is used to match the word “apple” as a whole word (not part of another word). We used re.sub() function to replace all occurrences of “apple” with “orange” in the given text. It prints the modified text.

Exercise 10: Tokenizing Text

Tokenization is the most important technique while you are working in the field of Natural language processing. We use this technique while pre-processing any text. In this regex exercise write a Python code to split a sentence into words based on space.

# 4 - Python regular expression assignment exercises and Solutions with Answers
import re

pattern = r'\w+'
text = 'Python is a versatile programming language.'

tokens = re.findall(pattern, text)
print(tokens)
Output:
['Python', 'is', 'a', 'versatile', 'programming', 'language']

Here, the regular expression \w+ matches one or more word characters. we used re.findall() function to tokenize the input text into a list of words. It prints the list of tokens. You can also use re.split() function to achieve the same result.

Exercise 11: Validating Phone Numbers

In this regular expression exercises, write a Python code to find and identify a correct phone number from a given list of numbers (solutions with answer code below). The 10-digit phone numbers should only contain numbers.

# Validating phone number using regular expression
import re

pattern = r'^\d{10}$'
numbers = ['1234567890', '9876543210', '123-456-7890', '987654321']

valid_numbers = [num for num in numbers if re.match(pattern, num)]
print(valid_numbers)
Output:
['1234567890', '9876543210']

In this code, the regular expression ^\d{10}$ validates 10-digit phone numbers. The re.match() function checks if each number in the list matches the pattern. The list comprehension filters out valid phone numbers, and it prints the result.

Also Read:  21 Python String Exercises for Beginners

Exercise 12: Extracting Dates

This Python code uses the regular expression \d{2}/\d{2}/\d{4} to extract dates in the format “MM/DD/YYYY” from the given text. The re.findall() function returns a list of all matches, and it prints the extracted dates.

# Regular expression exercises with solution to extract date from a given string
import re

pattern = r'\d{2}/\d{2}/\d{4}'
text = 'Meeting on 03/15/2024 and 04/20/2024'

dates = re.findall(pattern, text)
print(dates)
Output:
['03/15/2024', '04/20/2024']

Exercise 13: Password Strength Checker

In modern websites, you often see an implementation of this regex function. In this exercise write a regex code to find a strong password from a given list of passwords.

# Check password strength using regular expression in python
import re

pattern = r'^(?=.*[A-Za-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$'
passwords = ['P@ssw0rd', 'SecurePwd123', 'WeakPassword']

strong_passwords = [pwd for pwd in passwords if re.match(pattern, pwd)]
print(strong_passwords)
Output:
['P@ssw0rd']

The regular expression in the above code, checks for strong passwords by enforcing at least one uppercase letter, one digit, and one special character from the set [@$!%*?&]. Below is the detailed explanation of the pattern I used.

  • ^: Indicating that the pattern must match from the beginning.
  • (?=.*[A-Za-z]): This block is to ensure that the password contains at least one alphabet (uppercase or lowercase).
  • (?=.*\d): Ensures that the password contains at least one digit (0-9).
  • (?=.*[@$!%*?&]): This is used to check whether the password contains at least one special character from the provided set: @, $, !, %, *, ?, or &.
  • [A-Za-z\d@$!%*?&]{8,}: Specifies the allowed character set for the password and sets a minimum length of 8 characters. It allows a combination of uppercase and lowercase letters, digits, and the specified special characters.
  • $: Finally, this sign is used to indicate that the pattern must match until the end.

We used re.match() function to validate each password, and the list comprehension filters out strong passwords. Finally, print the result.

Exercise 14: HTML Tag Removal

While you are working as a Python backend developer or doing some web scraping, you may find this exercise helpful. In this exercises write a Python code using regular expression to remove all HTML tags from a given string, below is the solutions code.

# Remove HTML Tags from text in Python using regular expression
import re

pattern = r'<.*?>'
html_text = '<p>This is <b>bold</b> and <i>italic</i>.</p>'

clean_text = re.sub(pattern, '', html_text)
print(clean_text)
Output:
This is bold and italic.

We all know HTML tags start with < and end with > symbol. So here we used regular expression <.*?> to match and remove HTML tags from the given text. We used re.sub() function to replace all HTML tags with an empty string, resulting in clean text without tags.

Exercise 15: URL Extraction

In this Python exercises, we will see how we can extract only URLs using regular expressions from a given string. The pattern is simple. We only need to look for patterns like http: or https:.

import re

pattern = r'https?://\S+'
text = 'Visit our website at http://www.example.com or check https://blog.example.com'

urls = re.findall(pattern, text)
print(urls)
Output:
['http://www.example.com', 'https://blog.example.com']

This code uses the regular expression https?://\S+ to extract URLs from the given text. The pattern https? matches both “http” and “https” URLs, followed by “://” and one or more non-whitespace characters. The re.findall() function returns a list of all matches, and it prints the extracted URLs.

Exercise 16: Parsing JSON-like Data

If you are working as a backend developer with Python language, then JSON parsing might be your day-to-day work. In this exercises, write a Python code to extract only the required information from a JSON file using regular expression, in the below solutions. Here from a given JSON file we are extracting only name and city information (skipping age information).

# Parse JSON date using regular expression in Python

import re
import json

pattern = r'"(\w+)":\s*"([^"]*)"'
json_like_data = '{"name": "John Doe", "age": 30, "city": "New York"}'

matches = re.findall(pattern, json_like_data)
parsed_json = {key: value for key, value in matches}
print(json.dumps(parsed_json, indent=2))
Output:
{
  "name": "John Doe",
  "city": "New York"
}

In this code, the regular expression "(\w+)":\s*"([^"]*)" is used to parse JSON-like data. The pattern captures key-value pairs where keys are words (w+) and values are strings inside double quotes. Since age value (30) is not a string, it is not extracting that information. We used re.findall() function to return a list of tuples.

Also Read:  11 Basic lambda Function Practice Exercises in Python

Exercise 17: Extracting Twitter Handles

In this exercise, extract only Twitter handles (start with the symbol “@”) from a given string or tweet.

import re

pattern = r'@(\w+)'
tweet = 'Excited for the upcoming event! Follow us on @TwitterHandle and use #EventTag.'

twitter_handles = re.findall(pattern, tweet)
print(twitter_handles)
Output:
['TwitterHandle']

We used the regular expression @(\w+) to extract Twitter handles from a tweet. The pattern matches the ‘@’ symbol followed by one or more word characters. If you are planning to work with Twitter data (maybe sentiment analysis), then this can be a good application. This kind of regular expression exercises and solutions you can expect in Python interviews.

Exercise 18: Extracting Hashtags

Similar to above exercise, here in this regular expression practice assignment, try to extract Hashtags from a given string.

import re

pattern = r'#\w+'
text = 'Excited for the #PythonConference! #coding #programming Follow us on @TwitterHandle'

hashtags = re.findall(pattern, text)
print(hashtags)
Output:
['#PythonConference', '#coding', '#programming']

Like above in this regular expression, we used #\w+ to extract hashtags from the given text. The pattern matches the ‘#’ symbol followed by one or more word characters.

Exercise 19: Extracting IP Addresses

In this regular expression exercises write solutions in Python to extract IP addresses from a given string.

import re

pattern = r'\b(?:\d{1,3}\.){3}\d{1,3}\b'
text = 'Server IPs: 192.168.0.1, 10.0.0.255, and 172.16.31.128'

ip_addresses = re.findall(pattern, text)
print(ip_addresses)
Output:
['192.168.0.1', '10.0.0.255', '172.16.31.128']

In this code, we used the regular expression \b(?:\d{1,3}\.){3}\d{1,3}\b to extract IP addresses from the given text. Let me break down this complex pattern into smaller parts below:

  • \b: This makes sure the pattern starts and ends at a word boundary, preventing it from being part of a larger word.
  • (?: ... ): It’s a way to group parts of the pattern without remembering them individually. Here, it’s used to group the part that matches each segment of the IPv4 address.
  • \d{1,3}: This part looks for one to three digits. It’s used for each part of the IPv4 address because each part can range from 0 to 255.
  • \.: This matches the dot (.) that separates each segment of the IPv4 address.
  • {3}: It says that the previous part (the group \d{1,3}\.) should be repeated three times, representing the first three parts of the IPv4 address.
  • \d{1,3}: This matches the fourth part of the IPv4 address.
  • \b: Similar to the first one, it makes sure the pattern ends at a word boundary.

Conclusion

In this article, I tried to list out some Python exercises to practice regular expressions with answers and solutions that can help you to brush up your knowledge and can be useful before appearing for a job interview. You can download these regular expression Python exercises as pdf to practice these question answer codes with solutions in offline mode.

This is it for this article. If you want to learn Python quickly then this Udemy course is for you: Learn Python in 100 days of coding. If you are a person who loves learning from books then this article is for you: 5 Best Book for Learning Python. See you in the comment section below.

Similar Read:

Leave a comment