4

Consider having a python requirements.txt file with a list of (un-versioned) dependencies (python packages). After you install them (e.g. pip install -r requirements.txt) you can call pip freeze and get a (versioned) list of all installed python packages.

This will be a snapshot of the python package versions (and their dependencies) available at the time. What I need to generate is this same list, but for a date in the past (let's say 2018-06-12).

I guess technically, I only need to find the released versions for all packages contained in the requirements.txt file.

Ideally, there would be a command pip install -r requirements.txt --before 2018-06-21 and then just call pip freeze, but I didn't see anything like that in pip install --help. I did see a way to specify another --index-url and I could imagine if there was an archived index from that date, I could point pip to that and it should work?

There is also a --constraint option, which:

Constrain versions using the given constraints file

But I'm guessing I would already have to have the date-constraint versions in that case?

Chris
  • 2,875
  • 4
  • 24
  • 46

2 Answers2

5

From your question, if I get it right, you wanted to install dependencies with the following command:

pip install -r requirements.txt --before 2018-06-21

which require patching pip itself in order to add --before option to supply target date.

The code below it the second best thing. At the moment it is a rough sketch, but it does what you need, well almost, instead of generating requirements.txt, it outputs to the console the packages with the latest version up until the supplied date, in the format:

$ pipenv run python <script_name>.py django click --before 2018-06-21
pip install django==2.0.6 click==6.7

It's not exactly what you had in mind, but very close to it. Feel free to change it for your needs, by adding (or not) -r option and outputting every dependency on the new line, then with redirecting output, it would look something like that:

$ pipenv run python <script_name>.py django click --before 2018-06-21 >> requirements.txt

Code (or just use link to gist):

import sys
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import click

PYPI_URL = "https://pypi.org/project/{project_name}/#history"

def get_releases(request):

    soup = BeautifulSoup(request, 'html.parser')
    releases = list()

    for release in soup.find_all('div', class_='release'):
        release_version = release.find('p', class_='release__version').text.strip()
        if not is_numeric(release_version):
            continue
        release_date = try_parsing_date(release.find('time').text.strip())
        releases.append({'version': release_version, 'date': release_date})

    sorted_packages = sorted(releases, key=lambda s: list(map(int, s['version'].split('.'))))

    return sorted_packages


def is_numeric(s):
    for char in s:
        if not char.isdigit() and char not in [" ", ".", ","]:
            return False

    return True


def try_parsing_date(text):
    for fmt in ('%d.%m.%Y', '%d/%m/%Y', '%b %d, %Y', '%Y-%m-%d'):
        try:
            return datetime.strptime(text, fmt)
        except ValueError:
            pass
    click.echo('Not valid date format. Try to use one of this: <31.12.2018>, <31/12/2019> or <2018-12-31>')
    sys.exit(0)


@click.command(context_settings=dict(help_option_names=['-h', '--help']))
@click.option('-b', '--before', help='Get latest package before specified date')
@click.argument('packages', nargs=-1, type=click.UNPROCESSED)
def cli(before, packages):
    target_date = try_parsing_date(before) if before else datetime.today()

    required_packages = list()
    not_found = list()

    for package in packages:
        project_url = PYPI_URL.format(project_name=package)
        r = requests.get(project_url)
        if r.status_code is not 200:
            not_found.append(package)
            continue

        releases = get_releases(r.text)
        last_release = None
        for idx, release in enumerate(releases):
            release_date = release['date']
            if release_date > target_date:
                if last_release and last_release['date'] <= release_date:
                    continue
            last_release = release

        required_packages.append({'package': package,
                                  'release_date': last_release['date'],
                                  'release_version': last_release['version']})


    print('pip install ' + ' '.join('{}=={}'.format(p['package'], str(p['release_version'])) for p in required_packages))
    if len(not_found) > 0:
        print('\nCould not find the following packages: {}'.format(' '.join(p for p in not_found)))

if __name__ == '__main__':
    cli()

Required dependencies (Python3):

beautifulsoup4==4.7.1
Click==7.0
requests==2.21.0
Tim Jorjev
  • 79
  • 1
  • 2
  • 5
  • Thanks, I haven't tried out the script, yet, but I hope it will be helpful for someone with the same question. – Chris Jan 29 '19 at 16:19
  • 2
    A neater solution should be possible with the PyPI API, such as https://pypi.org/pypi/numpy/json – joeln Oct 03 '19 at 06:26
0

Alright, one possible answer (although not a great one) is to just manually go through each dependency in the requirements.txt, look that package up on https://pypi.org and then visit the release history (e.g. https://pypi.org/project/requests/#history). From there it's easy enough to see which version had been released at what date (e.g. https://pypi.org/project/requests/2.19.0/ for requests when including 2018-06-12) and then just use that as the version (requests==2.19.0).

A slightly better answer might be to extract that info (maybe via curl) from pypi programmatically, extract all version info (including the dates), sort it and pick the right one.

Chris
  • 2,875
  • 4
  • 24
  • 46