Whatsapp date-message visualization using python

In this article we shall see how to visualize the date and the number of messages that we sent on those dates using python and matplotlib.

Requirements and installation

  1. Download the dataset from here.
  2. Matplotlib for plotting the graph.

Installation

We can install the matplotlib from pypi.org by running the following command on the terminal.

pip install matplotlib

Once the download is complete, we are good to go.

How can we do it?

Follow these steps one by one to do this.

  1. First, open the chats.txt file.
  2. Read the files one by one using the readlines() method.
  3. Using regular expression get the dates alone from each line and add them to a list.
  4. Find the frequency of each date.
  5. Plot the graph against dates along the x-axis and frequency along the y-axis.

Open the chats.txt file

open file using python

The readlines() method will update the content variable with a list of lines from the chat.txt file. We can iterate over the list to read each line.

Regular expression to get dates from each line

using regular expression to get dates

As you can see we are iterating over each line to get the dates and adding it to a date list. The regular expression

[0-9]+/[0-9]+/[0-9]

gets the date for us from each line. This basically means we should be having one or more values from 0 to 9 for a date and one or more values from 0 to 9 for a month and one more values for a year each value from 0 t0 9. All of these values are separated by ‘/

The value stored in the all_dates list is,

['25/6/15', '25/6/15', '18/12/16', '21/12/16', '21/12/16', '21/12/16', '21/12/16', '21/12/16', '21/12/16', '21/12/16', '21/12/16', '21/12/16', '22/12/16', '22/12/16', '22/12/16', '22/12/16', '22/12/16', '22/12/16', '22/12/16', '22/12/16', '22/12/16', '24/12/16', '25/12/16', '25/12/16', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '1/1/17', '2/1/17', '2/1/17', '2/1/17', '2/1/17', '2/1/17', '8/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '10/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '11/1/17', '12/1/17', '12/1/17', '12/1/17', '12/1/17', '13/1/17', '13/1/17', '13/1/17', '13/1/17', '13/1/17', '13/1/17', '14/1/17', '14/1/17', '14/1/17', '14/1/17', '15/1/17', '15/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '16/1/17', '17/1/17', '17/1/17', '17/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '18/1/17', '19/1/17', '19/1/17', '19/1/17', '23/1/17', '23/1/17', '23/1/17', '23/1/17', '23/1/17', '23/1/17', '24/1/17', '24/1/17', '25/1/17', '25/1/17', '25/1/17', '26/1/17', '26/1/17', '26/1/17', '28/1/17', '28/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '29/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '30/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '31/1/17', '1/2/17', '1/2/17', '1/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '2/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '3/2/17', '4/2/17', '4/2/17', '4/2/17', '4/2/17', '4/2/17', '4/2/17', '4/2/17', '4/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '5/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '6/2/17', '7/2/17', '9/2/17', '9/2/17', '9/2/17', '9/2/17', '9/2/17']

Let us find the unique dates alone from all these dates.

unique dates

After this we shall find the occurrence frequency of each dates. We create a list of tuples. Each tuple will have a date and its frequency.

list of tuple of dates and frequencies

This is the value stored in the tuple_date_freq_list.

[('22/12/16', 9), ('21/12/16', 9), ('1/1/17', 23), ('17/1/17', 3), ('7/2/17', 1), ('2/2/17', 11), ('8/1/17', 1), ('31/1/17', 26), ('25/12/16', 2), ('14/1/17', 4), ('6/2/17', 28), ('13/1/17', 6), ('25/6/15', 2), ('10/1/17', 18), ('19/1/17', 3), ('24/12/16', 1), ('15/1/17', 2), ('23/1/17', 6), ('25/1/17', 3), ('28/1/17', 2), ('2/1/17', 5), ('18/12/16', 1), ('26/1/17', 3), ('1/2/17', 3), ('18/1/17', 41), ('30/1/17', 10), ('4/2/17', 8), ('12/1/17', 4), ('29/1/17', 12), ('16/1/17', 25), ('5/2/17', 26), ('9/2/17', 5), ('11/1/17', 73), ('24/1/17', 2), ('3/2/17', 14)]

After that we can use the zip operator to convert this list of tuples into two tuples one having only dates and another having its frequencies.

('21/12/16', '13/1/17', '3/2/17', '6/2/17', '14/1/17', '19/1/17', '25/6/15', '7/2/17', '2/2/17', '10/1/17', '26/1/17', '29/1/17', '15/1/17', '23/1/17', '12/1/17', '16/1/17', '18/12/16', '25/12/16', '18/1/17', '9/2/17', '1/1/17', '28/1/17', '30/1/17', '1/2/17', '2/1/17', '5/2/17', '4/2/17', '11/1/17', '22/12/16', '24/12/16', '8/1/17', '31/1/17', '17/1/17', '24/1/17', '25/1/17') (9, 6, 14, 28, 4, 3, 2, 1, 11, 18, 3, 12, 2, 6, 4, 25, 1, 2, 41, 5, 23, 2, 10, 3, 5, 26, 8, 73, 9, 1, 1, 26, 3, 2, 3)

Finally we can plot these values by matpolot lib as follows.

plotting graph

Complete code

The complete code looks like this.

The final output of this code is,

As we can see the dates are plotted along the x-axis and the number of messages that we sent on each day is plotted along the y-axis.

Conclusion

Hope this article is helpful. Keep reading us for more interesting and exciting articles.

Happy coding!