Working with strings is a common task in Python. You often need to figure out whether a string contains another string.
The simplest methods to check if a Python string contains a substring include the ‘in’ operator, the find() function, and the index() method. More complex scenarios can be solved with regular expressions or an external library like Pandas.
This article shows you:
four simple methods
two more complex regex functions
a method using the Pandas library
You’ll learn each technique through examples of their syntax and usage. You’ll also get tips on which method is best for different requirements.
Let’s get started!
Basics of Python Strings
A string in Python is a sequence of characters that is used to represent text-based data. They can include letters, digits, symbols, and whitespace.
It’s one of Python’s built-in data types and can be created using either:
single quotes (‘ ‘)
double quotes (” “)
triple quotes (”’ ”’ or “”” “””)
Strings are indexed, which means you can access specific characters by referencing their index number. The starting index is 0, which means the first character of a string has an index of 0, the second has an index of 1, and so on.
Next, we look at ways to check for substrings.
4 Simplest Ways to Check for Substrings
Python provides many ways to check if a certain substring exists within a larger string. Some are more complex than others. For the most basic checks, the simplest methods are the in operator or one of three string methods.
in operator
find() method
index() method
count() method
1. In Operator
The ‘in’ operator in Python is a simple and intuitive way to check if one string exists within another string. This operator checks for membership and returns a boolean value:
True if the substring is found within the main string
False if it isn’t
Here is some sample code:
s = "Hello, world!"
sub = "world"
result = sub in s
print(result) # Output: True
The ‘in’ operator is case-sensitive. This means that it treats lowercase and uppercase characters as different. The above code won’t find the substring “hello”, so the operator returns false.
If you want to perform a case-insensitive check, you can convert both the main string and the substring to the same case before performing the check:
s = "Hello, world!"
sub = "hello"
result = sub.lower() in s.lower()
print(result) # Output: True
The drawback of the in operator is that it doesn’t provide the position of the substring within the main string. For that, you’d need another method in this section. Read on…
2. Find() Method
The find() method returns the first index at which the substring appears, or -1 if the substring is not found.
You call the find() method on a string s, passing the substring sub as an argument. Here is an example:
s = "Hello, world!"
sub = "world"
index = s.find(sub)
if index != -1:
print("Found at index:", index) # Output: Found at index: 7
else:
print("Not found")
You can optionally specify a start or end index to limit your search. The drawback of this method is that it stops at the first occurrence.
3. Index() Method
The index() method is quite similar to the find() method, except that it raises a ValueError when the substring is not found. This means that you should wrap the call in a Try/Except section.
To use the index() method, call it on a string s, and pass the substring sub as an argument.
s = "Hello, world!"
sub = "world"
try:
index = s.index(sub)
print("Found at index:", index) # Output: Found at index: 7
except ValueError:
print("Not found")
This method also stops at the first substring inside the text.
4. Count() Method
The .count() method counts how many times a substring occurs in the original string. It returns an integer representing this count. If the substring is not found in the main string, it returns 0.
Here is a simple example that looks for the letter “o” in the text “Hello, world!”:
s = "Hello, world!"
sub = "o"
print(s.count(sub)) # Output:2
2 Ways to Match Strings With Regular Expressions
Using regular expressions (regex) is a little more complex than the methods in the previous section. However, they give you more options for searching and analyzing the text you are dealing with.
Two of the easiest ways within the re module are:
search()
findall()
1. re.Search()
The search() function in the re module searches for a pattern in a given string and returns a match object if a match is found. Otherwise, it returns None.
By default, the function is case-sensitive. You can use the re.IgnoreCase flag to avoid case sensitivity.
Here’s a simple example that uses conditional statements based on whether the string is found:
import re
pattern = "python"
text = "I love Python programming"
match = re.search(pattern, text, re.IGNORECASE)
if match:
print("Pattern found")
else:
print("Pattern not found")
Note that this method finds the first match and then stops searching. If you want to find all matches, then the next on this list is for you.
2. re.Findall()
This method finds all occurrences of a search pattern in a given string and returns a list containing all matches.
This example uses a pattern that matches one ore more digits in a string. The text string “123, 456, 789” contains three sequences of digits.
import re
pattern = "d+"
text = "123, 456, 789"
numbers = re.findall(pattern, text)
print(numbers)
This is the answer from the sample code: [‘123’, ‘456’, ‘789’].
How to Find Substrings With Pandas
Pandas is a popular open-source data analysis and manipulation library for Python. It provides data structures and functions needed to manipulate and analyze structured data.
One of the primary data structures provided by Pandas is the DataFrame. You can use this versatile data structure to check for one string inside another.
The advantage of Pandas is that it provides packages of code that you would have to write extensively in other programming languages. Specifically, a lot of power is packed inside the contains function.
str.contains() Function in Pandas
The str.contains() method tests if a specified pattern or regular expression is contained within a string of a DataFrame column.
Here is an example that imports the library and searches for a string within a list of strings:
import pandas as pd
# Creating a sample DataFrame
data = {'fruits': ['apple', 'banana', 'cherry', 'watermelon', 'orange']}
df = pd.DataFrame(data)
# Searching for substrings in the 'fruits' column
has_an = df['fruits'].str.contains(pat='an', regex=False)
# Filtering the DataFrame based on the search results
filtered_df = df[has_an]
print(filtered_df)
In this example, we search for the substring ‘an’ in the ‘fruits’ column and filter the DataFrame accordingly. The output would be:
1 banana
4 orange
If you’re looking for more examples of what Pandas can do, check out this video:
Tips for Choosing a Specific Method in Python
The choice of the method depends largely on the specific requirements of your task.
Here are four reasons to pick one of the methods you’ve learned from this article, plus one bonus method you’ll learn elsewhere on this blog:
Speed of processing
You need to know the location of the substring
You need to know the number of occurrences of the substring
You want to match complex patterns
Performing text analysis
1. Speed of Processing
The ‘in’ 0perator is the best choice if you simply want to know if a substring exists within a string.
It’s simple, intuitive, and fast for this purpose. However, it does not provide information about the location or the count of the substring.
2. Location of Substring
Use the .find() or .index() methods if you need to know the position of the first occurrence of a substring within a string.
They both return the index of the first occurrence of the substring.
3. Count of Substring
Use the .count() method when you want to know how many times a substring occurs in a string.
4. Complex Patterns
If you need to match complex patterns or perform case-insensitive searches, regular expressions are the most flexible. The re.search() function can handle patterns beyond simple substrings and can easily perform case-insensitive searches.
However, regular expressions can be overkill for simple substring checks and can also be slower and more complex to use and maintain.
5. Performing Text Analysis
Check out our article on text analysis in Python, which shows you how to work with the textblob library.
You can use the .tags function to check for parts of speech or the .ngram function to find words that frequently appear together in a document.
Final Thoughts
You’ve learned seven methods to check if a word or string object is contained in another. These included membership operators, built-in functions, regex, and Pandas functions.
Each method is correct for some situations but not for all. Use this article as a cheat sheet as you master the Python programming language and use the best solution for your string analysis task.
Remember, there’s no one-size-fits-all method. Each approach has its own perks and quirks, so feel free to pick the one that fits your style or the particular problem you’re trying to solve. At the end of the day, Python is all about making your life easier, so embrace its flexibility!