Find the frequency of words in a string using python in one line

If you are working with any text documents or paragraphs or files and if we want to do sentiment analysis, we will always come across a situation where we should know the frequency of the words. There is both traditional approach and a one-liner in python.

Let us look at the traditional approach and the one-liner with an example.

Traditional approach

In the traditional approach we will do the following steps.

  1. With the help of the split() method get a list of words.
  2. With the set method get a list of unique words alone.
  3. Compare each unique word to the list of words and get its frequency using the count() method.
  4. Using zip method make a tuple of word and frequency.

Let us look at the below example.

paragraph = "If you are working with any text documents or paragraphs or files and if we want to do sentiment " \
            "analysis, we will always come across a situation where we should know the frequency of the words. There " \
            "is both traditional approach and a one-liner in python. "

words = paragraph.split()
unique_words = list(set(words))

frequencies = []

for word in unique_words:
    frequencies.append(words.count(word))

tuple_list = list(zip(words, frequencies))
print(list(tuple_list))

The output of the above code is,

[('If', 1), ('you', 1), ('are', 1), ('working', 1), ('with', 2), ('any', 1), ('text', 2), ('documents', 1), ('or', 1), ('paragraphs', 1), ('or', 1), ('files', 1), ('and', 1), ('if', 1), ('we', 1), ('want', 1), ('to', 1), ('do', 1), ('sentiment', 1), ('analysis,', 1), ('we', 1), ('will', 1), ('always', 1), ('come', 2), ('across', 1), ('a', 2), ('situation', 1), ('where', 1), ('we', 1), ('should', 1), ('know', 1), ('the', 1), ('frequency', 1), ('of', 1), ('the', 1), ('words.', 1), ('There', 1), ('is', 3), ('both', 1), ('traditional', 1)]

one-liner approach

In the one-liner approach we will be using the Counter class from the collections module. This class has a special method called most_common() which will give us a list of tuples with the element and its frequency.

Let us re write the above program using this concept.

from collections import Counter

paragraph = "If you are working with any text documents or paragraphs or files and if we want to do sentiment " \
            "analysis, we will always come across a situation where we should know the frequency of the words. There " \
            "is both traditional approach and a one-liner in python. "

result = Counter(paragraph.split()).most_common()
print(result)

The output of the above code is,

[('we', 3), ('or', 2), ('and', 2), ('a', 2), ('the', 2), ('If', 1), ('you', 1), ('are', 1), ('working', 1), ('with', 1), ('any', 1), ('text', 1), ('documents', 1), ('paragraphs', 1), ('files', 1), ('if', 1), ('want', 1), ('to', 1), ('do', 1), ('sentiment', 1), ('analysis,', 1), ('will', 1), ('always', 1), ('come', 1), ('across', 1), ('situation', 1), ('where', 1), ('should', 1), ('know', 1), ('frequency', 1), ('of', 1), ('words.', 1), ('There', 1), ('is', 1), ('both', 1), ('traditional', 1), ('approach', 1), ('one-liner', 1), ('in', 1), ('python.', 1)]

This approach will even sort the list based on the frequency of each word in the tuple.

Conclusion

I hope this article is helpful. Always write your code in a more pythonic way.

Happy coding!