Although a popular post last year claimed that Python package management is a solved problem, there are still a number of challenges for dependency management in Python. For example, how can you determine the latest version of a package that is available to install via pip?
You might think that the answer is as simple as using
pip install -U $PACKAGE_NAME, and if you are installed directly into your deployment environment you are correct. However, if you are using Docker the answer is more complicated. Docker caches
RUN lines as long as the actual text of the line is unchanged, so if you add
RUN pip install -U my_favorite_package to your Dockerfile on January 1 and the cache is never busted, you will still have the same version of that package on December 31 no matter how many updates have been published in the meantime. Moreover, this makes it difficult to tell which version of the package was actually included in your container unless you run the container interactively and use the
pip show command.
How can you ensure that the latest version is installed in your Docker image? The way that I will describe here is to find out the latest version and then put it into your Docker image directly. The second part is simple–instead of specifying your version in the Dockerfile that lives in your repository, store a placeholder such as
RUN pip install my_favorite_package==version_placeholder and then replace that (with a tool such as
sed) at build time.
The first part is a bit more complicated–how to find the latest version. One way, as described in this post is to run
pip install with a non-existent version and then parse the error message to get the latest version. That is fairly straightforward and was possibly the best way at the time that the post was written. If you are on a more recent version of
pip, though, you can use a command specifically for this purpose:
For example, if you wanted to find the latest version of the
yolk package, you could run:
To filter this to just the version number (the part that you need to put into your Dockerfile), you can do the following:
After the search, the next part of the one-liner finds the part of the output indicating the latest version (
grep -i "latest"). Then it subsitutes out anything that does not look like a version number (
sed 's/[^0-9.,]*//g'). It then converts spaces to newlines to make it easy to grab the part we need (
tr " " "\n") and chooses the final line (
tail -n 1). If for some reason you wanted a version that was not marked with the “latest” tag, you could add a command to sort the version numbers themselves (
sort --version-sort) before the
This command certainly is not pretty, but it is the best way I have found to reliably find the latest version of a Python package. As you can see, the ecosystem has a long way to go before I would consider dependency management a “solved problem.”