Find frequently used words in a paragraph and visualize it

In this article, we are going to find the frequently used words from a paragraph and visualize it using matplotlib library. The words and their occurrences are plotted along the x and y-axis. This helps us to easily visualize the frequently used words in a paragraph and also see the number of times it has been used.

This will require the installation of the matplotlib. This library can be easily installed using pip as,

pip install matplotlib

Let us get to the coding part. The followings steps have to be followed to achieve the result.

1. Take a paragraph.

I have this following paragraph. I am going to visualise this paragraph.

Are you getting my texts???" she texted to him. He glanced at it and chuckled under his breath. Of course he was 
getting them, but if he wasn't getting them, how would he ever be able to answer? He put the phone down and continued 
on his project. He was ignoring her texts and he planned to continue to do so.

2. get words as a list

Let us get the words from this paragraph as a list using the split() method. This method takes a value based on which it will divide strings. Since I have not given any value, the string will be split based on white space characters.

words = paragraph.split()

This will give us a list of words from the paragraph like this.

list of words

3. Remove punctuation from the words

As you can see some of the words have punctuation marks associated with it. We have to remove them also to get a cleaner word.

The string module in python gives us a string of punctuations.

string punctuations

We can use a for loop to iterate the words in the list one by one and remove the punctuation from the words like this.

python remove punctuations from words

after removing the punctuation our words will look like this.

['Are', 'you', 'getting', 'my', 'texts', 'she', 'texted', 'to', 'him', 'He', 'glanced', 'at', 'it', 'and', 'chuckled', 'under', 'his', 'breath', 'Of', 'course', 'he', 'was', 'getting', 'them', 'but', 'if', 'he', 'wasnt', 'getting', 'them', 'how', 'would', 'he', 'ever', 'be', 'able', 'to', 'answer', 'He', 'put', 'the', 'phone', 'down', 'and', 'continued', 'on', 'his', 'project', 'He', 'was', 'ignoring', 'her', 'texts', 'and', 'he', 'planned', 'to', 'continue', 'to', 'do', 'so']

4. find the occurrence of each word

The collections module in python will take an iterator as an input and return a dictionary with each word as keys and their occurrences as the value. This will return us a collection object which can be converted into a dictionary.

This will give us an output like this.

{'Are': 1, 'you': 1, 'getting': 3, 'my': 1, 'texts': 2, 'she': 1, 'texted': 1, 'to': 4, 'him': 1, 'He': 3, 'glanced': 1, 'at': 1, 'it': 1, 'and': 3, 'chuckled': 1, 'under': 1, 'his': 2, 'breath': 1, 'Of': 1, 'course': 1, 'he': 4, 'was': 2, 'them': 2, 'but': 1, 'if': 1, 'wasnt': 1, 'how': 1, 'would': 1, 'ever': 1, 'be': 1, 'able': 1, 'answer': 1, 'put': 1, 'the': 1, 'phone': 1, 'down': 1, 'continued': 1, 'on': 1, 'project': 1, 'ignoring': 1, 'her': 1, 'planned': 1, 'continue': 1, 'do': 1, 'so': 1}

5 . plot the graph

Sort the occurrences dictionary based on key which is the word. This will sort the words alphabetically. This step will return us a list of tuples.

This will give us the list of tuples like this.

[('able', 1), ('and', 3), ('answer', 1), ('are', 1), ('at', 1), ('be', 1), ('breath', 1), ('but', 1), ('chuckled', 1), ('continue', 1), ('continued', 1), ('course', 1), ('do', 1), ('down', 1), ('ever', 1), ('getting', 3), ('glanced', 1), ('he', 7), ('her', 1), ('him', 1), ('his', 2), ('how', 1), ('if', 1), ('ignoring', 1), ('it', 1), ('my', 1), ('of', 1), ('on', 1), ('phone', 1), ('planned', 1), ('project', 1), ('put', 1), ('she', 1), ('so', 1), ('texted', 1), ('texts', 2), ('the', 1), ('them', 2), ('to', 4), ('under', 1), ('was', 2), ('wasnt', 1), ('would', 1), ('you', 1)]

After this we can use the zip method to convert the values to tuples and plot the graphs.

x, y = zip(*lists)
('able', 'and', 'answer', 'are', 'at', 'be', 'breath', 'but', 'chuckled', 'continue', 'continued', 'course', 'do', 'down', 'ever', 'getting', 'glanced', 'he', 'her', 'him', 'his', 'how', 'if', 'ignoring', 'it', 'my', 'of', 'on', 'phone', 'planned', 'project', 'put', 'she', 'so', 'texted', 'texts', 'the', 'them', 'to', 'under', 'was', 'wasnt', 'would', 'you') (1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 7, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 4, 1, 2, 1, 1, 1)

I am using bar graph so i call the bar method from plt. The pass method requires the x and y axis values. Provide them to the method like this.

plt.bar(x, y)

Use the show() method to display the graph. Our graph will look like this.

Note that the values on the x-axis are overlapped. This is because the values are written horizontally. We can make the values vertical by modifying a method called x-ticks. This method takes a parameter called rotation and specifying a value of 90 here will make the values written along the x-axis vertical.

plt.xticks(rotation=90)

We have to call this method before the show() method. This will alter our graph like this.

This is the final output and now our graph looks much cleaner. The word “he” is used 7 times in our paragraph.

The complete code to this problem is,

I hope this article was helpful. We can customize our graph with various methods available in the matplotlib library. If you have any doubts mention them in the comments.

Happy coding !