Upgrade python packages using web scraping

In this article, we shall see how to upgrade the python libraries in your environment with this piece of code. This library will look for the latest version of the library you have installed from the pypi.org.

How does this work?

I came up with this idea to web-scrap the latest version of the python package that we have installed from pypi.org and update it. It consists of the following steps.

  1. Get the list of packages installed in our local machine.
  2. Iterate over them and get the package name.
  3. Visit the pypi.org website and look for the latest available version of the package.
  4. Using Beautifulsoup scrap the package name and version name.
  5. Compare the versions of the two packages. (The local one and the scrapped one)
  6. Ask the user if he wants to update the package.
  7. If yes update the package by running the upgrade command using the subprocess module in python.
  8. If no, continue with the next package.

This requires the usage of some built-in and third party libraries.

Required Built-in libraries

The following two builtin libraries are required for this purpose.

  1. pkg_resources
  2. subprocess

1. pkg_resources

pkg_resources is a module used to find and manage Python package/version dependencies and access bundled files and resources, including those inside of zipped .egg files. Currently, pkg_resources is only available through installing the entire setuptools distribution, but it does not depend on any other part of setuptools; in effect, it comprises the entire runtime support library for Python Eggs, and is independently useful.

usage here: We will be using this library to get the list of installed packages in our environment along with the version

2. subprocess call

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

usage here: The call method from this module can be used to run the terminal commands directly from the python code. We will be using this method to run the upgrade command from the python code itself.

Required third party libraries

The following third party libraries are also required.

  1. requests
  2. Beautifulsoup

1. requests

Requests

The requests module allows you to send HTTP requests using Python. The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).

Usage here: We will be using this library to get the html data of the python package from the pypi.org website.

Installation

pip install requests

2. Beautifulsoup

Python web scraping with BeautifulSoup

Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping. It is available for Python 2.7 and Python 3.

Usage here: We will be using this library to scrape the version number and package name from the html data.

Installation

pip install beautifulsoup4

The code

Let us get started with the code.

1. Import the necessary packages.

Let us go ahead and import the required packages first.

import requests
import pkg_resources
from bs4 import BeautifulSoup
from pkg_resources import parse_version
from subprocess import call

These are the packages that we require in our code. The next step is to get the list of installed packages in our environment.

2. Get the installed packages

The following code will give us the installed packages with the package name and the version number.

for pkg in pkg_resources.working_set:
    original_version = pkg.version
    sys_package_name = pkg.project_name
    print(sys_package_name, original_version) 

3. Get the HTML data

The next step is to get the HTML data of the package’s webpage from pypi.org. We can simply add the package name to the base URL to go to the individual package’s webpage.

base_url = "https://pypi.org/project/"
html = requests.get(base_url + package_name)

4. Parse the data using Beautifulsoup

Next step is to parse the HTML data to get the package name and version alone from the pool of HTML data.

pypi_package_name, pypi_version = soup.find('h1', class_='package-header__name').text.strip().split(' ')

5. Run the upgrade command using subprocess call

The next and final step is to run the upgrade command using the subprocess call method. This will upgrade our package.

call("pip install --upgrade " + ''.join(sys_package_name), shell=True)

That is it. This is all we require to upgrade our python packages.

The Complete code

The complete code to do this task is,

This will iterate over our packages one by one, check if the latest version is available and if available will get our input whether to update or not.

Conclusion

Hope this article is useful. Please leave your questions in the comment box below.

Happy coding!