Generate word cloud using python


In this article, we will generate word clouds using python and matplotlib.

Installation

The two packages can be easily installed via pip.

pip install wordcloud
pip install matplotlib

What is a word cloud?

word cloud is an image made of words that together resemble a cloudy shape.
The size of a word shows how important it is e.g. how often it appears in a text — its frequency.

People typically use word clouds to easily produce a summary of large documents (reports, speeches), to create art on a topic (gifts, displays) or to visualize data (tables, surveys).

WordCloud
word cloud

code

The following code snippet generates a simple word cloud. By default the shape is a rectangle image. We can use mask property to make any shape we want which we will see in the later example.

This code snippet will generate something just like this.

This image has an empty alt attribute; its file name is image-76-1024x768.png

The wordcloud class takes various arguments. Some of the arguments are

  1. minwidth
  2. height
  3. background_color
  4. min_font_size etc.

In this example i am giving the height, width and background values.

The output of this code is,

This image has an empty alt attribute; its file name is image-77-1024x614.png

Mask image

We can also define the shape of our word cloud. We need to have an image first and then we can mask the shape of our word cloud to that image. Let us take the picture of an elephant like this.

elephant

If we want our word cloud to be shaped like an elephant the following code would do the task.

This would give an output like this.

elephant word cloud

I can change the background color to white and the words color to black to make it look more like an elephant.

cloud = WordCloud(background_color='white',  mask=shape,
                  color_func=lambda *args, **kwargs: "black", width=2000, height=2000).generate(text)

adding the “background_color” and “color_func” arguments will generate us an elephant word cloud with white background and words with black color.

We can also add an outline to our image. This can be done by adding the following two additional arguments to our WordCloud class.

cloud = WordCloud(background_color='white',  mask=shape, contour_color='black', contour_width=5,
                  color_func=lambda *args, **kwargs: "black", width=2000, height=2000).generate(text)
elephant word cloud with outline

Thanks for reading. Hope this article was helpful.

Happy coding !