Comparing directories using python

In this article, we shall see how to compare two directories using the built-in module in python called “filecmp”. This module cannot be used to compare the contents of two files. It rather will compare the files and subdirectories in the two given directories.

In order to compare the contents of two files, there is another class called cmpfiles. For the sake of this article, we will be seeing only the dircmp class.

This module provides us various methods such as,

  1. Finding the same files in the two directories.
  2. Finding the common subdirectories in the two directories.
  3. Finding both common files and subdirectories.
  4. Getting all the files and subdirectories in the left directory alone.
  5. Getting all the files and subdirectories in the left directory alone.
  6. Getting the files and subdirectories in the left directory alone.
  7. Getting the files and subdirectories in the right directory alone.

Consider the following directory structure. I am going to explain the above methods one by one with the help of this directory structure only.

Image for post
directory structure

I have the two main directories that I am going to compare.

  1. SampleDirectoryOne
  2. SampleDirectoryTwo

Each directory has an identical file called sample.txt. They also have a unique file for each of them called sample1.txt and sample2.txt.Both also have a subdirectory and a single file in the subdirectory. Both directories also have a common subdirectory called subdir/.

Let us get started with the examples.

1. Finding the same files in the two directories

The dircmp class has an attribute called common_files. This attribute returns the common files in the two directories as a list. First, the dircmp class will take both the directories and return as an object. With the help of the common_files attribute, we can retrieve the list of common files alone from this object.

Image for post
find the same files in two directories

The output of the above code is,

['sample.txt']

Only this file exists in both the directories.

2. Finding the common subdirectories in the two directories

The next example is finding the common subdirectories in both the directories. The dircmp class has an attribute for this too. The common_dirs attribute can be used to retrieve the list of common subdirectories in both the directories.

Image for post
finding common subdirectories using python

The output of the above code is,

['subdir']

Since I have the subdirectory ‘subdir/’ common in both the root directories the code returned this as the output.

3. Finding both common files and subdirectories

The common attribute from the dircmp class can give us both the common files and subdirectories from the directories that we are comparing.

Image for post
finding the common subdirectories and files

The output of the above code is,

['sample.txt', 'subdir']

The file sample.txt exists in both the directories and so does the subdirectory subdir.

4. Getting all the files and subdirectories in the left directory alone

The next example is to get all the files and directories in the left directory alone. The left_list attribute from the dircmp class returns us a list of all the files and directories in the left directory alone.

Image for post
files and directories in dir1 alone

The output of the above code is

['sample.txt', 'sample1.txt', 'subdir', 'subdirectoryone']

sample.txtsample1.txt are the files from the left directory, and subdirsubdirectoryone are the directories from the left directory(dir1).

5. Getting all the files and subdirectories in the left directory alone

This example is similar to the above one. Instead of retrieving the files and directories on the left directory here, we will be retrieving the files and directories from the right directory alone. The right_list attribute helps us to achieve this.

Image for post
files and directories in dir2 alone

The output of the above code is

['sample.txt', 'sample2.txt', 'subdir', 'subdirectorytwo']

These are the files and directories in the right directory(right) alone.

6. Getting the files and subdirectories in the left directory alone

We can also get the files and subdirectories which are unique only to the left directory alone. If the same file or directory exists in the other directory also then it is omitted.

Image for post
files and directories unique only in the left directory will be retrieved

The output of the above code is,

['sample1.txt', 'subdirectoryone']

Only the sample1.txt and subdirectoryone is unique in the left directory.

7. Getting the files and subdirectories in the right directory alone

We can also get the files and subdirectories which are unique only to the right directory alone. If the same file or directory exists in the other directory also then it is omitted.

Image for post
unique files and subdirectories in the right directory alone

The output of the above code is,

['sample2.txt', 'subdirectorytwo']

Only the file sample2.txt and directory subdirectorytwo is unique to the right directory.

Conclusion

Hope this article is helpful. If you have any queries leave them in the comments below. I will try to answer them as quickly as possible.

Happy coding!