Lemmatization In Python | A Beginner’s Guide

by | Power BI

In this tutorial, we’re going to discuss lemmatization in Python which is a method that is used for grouping together the different elements of a word. Lemmatization also aims to reduce word inflection and focuses on providing the root or base form of a word which is what the word lemma means.

Lemmatization Vs Stemming

Lemmatization is similar to stemming which also functions to reduce inflections in words. The only difference is that lemmatization uses dictionary-based words as result.

On the other hand, stemming only removes the affixes from an inflected word which may result in words that aren’t existing.

For example, if we utilize stemming to the word studies, it will give the word studi as an output since it aims to remove the suffix es from the word studies.

On the other hand, if lemmatization is utilized, the word study will be given as a result since it focuses on providing the base form of a word.

Things To Consider In Utilizing Lemmatization

  • It uses dictionary-based words. With the term lemma which means the root or base form of a word, lemmatization aims to provide the base form of a word rather than just removing the inflections of a word.
  • It completely depends on parts of speech to find a base word. Without specifying the parts of speech), lemmatization might not perform well and you might not get the result that you’re looking for.
  • It is slower than stemming but it’s more powerful. Since lemmatization doesn’t follow an algorithm to perform on words and the need of providing parts of speech, it is considered slower than stemming. However, it’s more powerful in a way that it uses dictionary-based words for results. 
  • It has higher accuracy in looking for the root word. As lemmatization uses dictionary-based words in laying out results from an inflected word, you’ll have higher chances of getting accurate outputs.

Preparation Stage For Lemmatization In Python

Before we proceed to implementing lemmatization, let’s begin by importing the Word library from textblob.

Lemmatization In Python

After that, we’re going to create a word object. 

Lemmatization In Python

To create a word object, we created a variable named w. Then we stored the Word library that holds our word object which is octopi, the plural form of the word octopus. Take note that when passing an element using the Word library, it’s important to enclose that element with single quotations.

Let’s initialize the variable w to see if it holds the word object that we just created.

Lemmatization In Python

Upon executing the w variable, we get the word object octopi as a result.

Implementing Lemmatization In Python

Next, we’re going to implement lemmatization by using the .lemmatize function. 

Lemmatization In Python

In this step, we used the w variable that holds the word object octopi and we utilized the .lemmatize function to apply lemmatization. As a result, we got the word octopus which is the root or base form of the word octopi.

After that, let’s try applying lemmatization with the word better.

Lemmatization In Python

In the previous example, we updated our word object from octopi into better. Then we lemmatized it with the .lemmatize function. Thus, the result we got is the same as the word object that we used.

In using the .lemmatize function, you can change the way of its lemmatization by passing in a part of speech. As an example, let’s try passing in a to the .lemmatize function which stands for adjective in the parts of speech. 

After adding a part of speech to the .lemmatize function, we’re able to get the base word good as a result.

Let’s change our word object again into running. Let’s also change the part of speech that we’ll be passing to the .lemmatize function into v which stands for verb.

After making the changes and initializing the .lemmatize function, we got the root word of the word running which is run as a result. Most of the lemmatizers are not capable of performing the methods that we just did in using the .lemmatize function.

However, .lemmatize function is a considerable tool to utilize when performing certain types of text analysis in Python to get the base form of a word.

***** Related Links *****
How To Use Python Script In Power BI
How To Load Sample Datasets In Python
Python User Defined Functions | An Overview

Conclusion

In brief, we’re able to understand the usage of lemmatization in Python and how it works. We’ve discussed the similarity and differences of lemmatizing from stemming as well. We’re also able to create a word object using the Word library and how to utilize the .lemmatize function.

Moreover, we’ve learned how to apply different parts of speech in the .lemmatize function. Implementing lemmatization in your day-to-day text analysis tasks will greatly help you lessen the time and effort in searching for the base word of a specific word.

All the best,

Gaellim

author avatar
Gaelim Holland
Gaelim Holland is an Innovative Data Analyst and Digital Channel Optimization Specialist with a thorough knowledge of Omni channel analytics and incorporating online and offline data in funnel analysis.

Related Posts