Find the difference between two strings in python

Python has a lot of built-in modules and libraries. It also has a module called difflib. This module can be used to find the differences between the two strings. It is a built-in module.

The difflib module has a method called ndiff. This method will take both strings and return us a generator object. We can iterate this generator object to find the differneces.

Let us look at it with an example.

import difflib

differences = difflib.ndiff('abc', 'abd')

for difference in differences:
    print(difference)

The output of the above code is,

  a
  b
- c
+ d

What this code basically does is, it will compare the first string with the second one and then tell us how we can convert the first string to the second string. It does a letter by letter comparison with both the words. There are three cases while doing this comparison.

  1. Both characters are some
  2. First string has additional character than string B and it has to be removed. This is indicated by a minus (-) symbol.
  3. First string has some characters missing compared to string B and it has to be added. This is indicated by a plus (+) symbol.

Hope you are now clear with the functionality of this module. Let us go ahead and make our code do more things. It would be cool if our code tells us the position also right? where to add and where to remove.

import difflib

cases = [('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),
         ('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),
         ('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),
         ('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),
         ('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),
         ('abcdefg', 'xac')]

for a, b in cases:
    print('{} => {}'.format(a, b))
    for i, s in enumerate(difflib.ndiff(a, b)):
        if s[0] == ' ':
            continue
        elif s[0] == '-':
            print(u'Delete "{}" from position {}'.format(s[-1], i))
        elif s[0] == '+':
            print(u'Add "{}" to position {}'.format(s[-1], i))
    print()

In the above code we have strings with multiple cases. Let us look at the output to get a clear idea of what’s happening.

afrykanerskojęzyczny => afrykanerskojęzycznym
Add "m" to position 20

afrykanerskojęzyczni => nieafrykanerskojęzyczni
Add "n" to position 0
Add "i" to position 1
Add "e" to position 2

afrykanerskojęzycznym => afrykanerskojęzyczny
Delete "m" from position 20

nieafrykanerskojęzyczni => afrykanerskojęzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2

nieafrynerskojęzyczni => afrykanerskojzyczni
Delete "n" from position 0
Delete "i" from position 1
Delete "e" from position 2
Add "k" to position 7
Add "a" to position 8
Delete "ę" from position 16

abcdefg => xac
Add "x" to position 0
Delete "b" from position 2
Delete "d" from position 4
Delete "e" from position 5
Delete "f" from position 6
Delete "g" from position 7

As you can see, we now have clear instructions of how to make the first string same as the second string. We also have well-defined description of where to add and where to remove the characters.

Conclusion

Hope this article is helpful. We learn about new modules everyday.

Happy coding!