Python regular expression match, search and findall

Python beginners may sometimes get confused by this match and search functions in the regular expression module, since they are accepting the same parameters and return the same result in most of the simple use cases. In this article, let’s discuss about the difference between these two functions.

Let’s start from an example. Let’s say if we want to get the words which ending with “ese” in the languages, both of the below match and search return the same result in match objects.

import re
languages = "Japanese,English"
m = re.match("\w+(?=ese)",languages)
#m returns : <re.Match object; span=(0, 5), match='Japan'>
m = re.search("\w+(?=ese)",languages)
#m returns : <re.Match object; span=(0, 5), match='Japan'>

But if the sequence of your languages changed, e.g. languages = “English, Japanese”, then you will see some different results:

languages = "English,Japanese" 
m = re.match("\w+(?=ese)",languages)
#m returns empty
m = re.search("\w+(?=ese)",languages)
#m returns : <re.Match object; span=(8, 13), match='Japan'>

The reason is that match function only starts the matching from the beginning of your string, while search function will start matching from anywhere in your string. Hence if the pattern you want to match may not start from the beginning, you shall always use search function.

In this case, if you want to restrict the matching only start from the beginning, you can also achieve it with search function by specifying “^” in your pattern:

languages = "English,Japanese,Chinese" 
m = re.search("^\w+(?=ese)",languages)
#m returns empty
m = re.search("\w+(?=ese)",languages)
#m returns: <re.Match object; span=(8, 13), match='Japan'>

You may also notice when there are multiple occurrences of the pattern, search function only returns the first matched. This sometimes may not be desired when you actually want to see the full list of matched patterns. To return all the occurrences, you can use the findall function:

languages = "English,Japanese,Chinese,Burmese"
m = re.findall("\w+(?=ese)", languages)
#m returns: ['Japan', 'Chin', 'Burm']

Original post from https://www.codeforests.com

Resources and tutorials for python, data science and automation solutions