On Environment/Package Management in Python
Python's package management is a mess. I'm involved in a few open source projects and I often help users address their environment & installation issues. A large number of these environment issues essentially come down to incorrectly / accidentally mixing multiple different python environment together. This post lists a few common pitfalls and misconceptions of such.
People often unfortunately have multiple python binaries and multiple installations of python pckages, e.g.:
- The OS's package manager can install python and some python packages.
- Users can use
pip installto install new packages to different locations.
pip installetc., under a virualenv, can install to a location under the virtualenv.
- Anaconda users will install python packages to its own location.
To install a library, all of the above methods are very common. As a result, many python developer's machines have multiple environments. A ton of problems can arise from this.
Be careful of multiple installations of the same package¶
For reasons above, you could have multiple installations of the same package in your system. It often causes very confusing issues if you think you're using one installation, but is actually using a different one. Examples of such issues include:
- You install a package of desired version but still see complaints about wrong package version, or run into bugs that exist in the wrong version
- You build & install a package with your custom changes but they are not effective
- You attempt to fix a bug by changing the source code, but you're in fact running another installation of the package so the bug never appears to be fixed
When such issues appear, remember to verify what/where is the library you're using. When in doubt, try the following methods:
import lib; print(lib.__version__)to know the version of library you're using. However not all packages have the
__version__attribute. It could also be
import lib; print(lib.__file__)to know the location of library you're using. This method should work for most packages.
strace -fe file python -c 'import lib; do_something_with_lib()'to see every file used by the command. This tells you everything needed to figure out whether you have the issue of multiple installation.
I have the following command line alias to help me check libraries:
pip list or
conda list to check package version¶
The version you see in these two commands may not match what you're actually
using, because there could be multiple versions of the same
library in the system installed by
conda or other methods.
conda is able to know all of them.
To tell precisely the version of a library you're using, follow suggestions above.
setup.py install to install packages¶
Usually, a package installed in this way is not managed by any system:
no command can tell you it is installed; no command can uninstall it for you.
pip uninstall for such packages may complain that it "cannot determine which files belong to it", or it may just do
You often need to manually remove files to really uninstall it.
The result is that, when you need to install a different version of it some day in the future, using
other methods (e.g.
conda), it either fails, or succeeds but give you a system of multiple installations.
python -m pip over
There could be multiple
python binaries in your system
(e.g., from system, venv, anaconda).
pip is just a python script: based on how its shebang line is written,
some versions of
pip pick the
python executable from your
but some versions of
pip have hard-coded absolute path to the
python executable it will use.
As a result, when you run
pip install directly,
it's not immediately clear which python it will use, let alone where the library will be installed.
On an environment with more than one
python -m pip or
/some/python -m pip, instead of the
pip command directly.
pip uninstall multiple times¶
If you want to uninstall something, uninstall it multiple times until it converges.
pip can install one package multiple times
in different locations (e.g., one inside virutalenv/conda + one in
python -c 'import lib' to confirm uninstallation¶
Not everything can be uninstalled with a simple
pip uninstall or conda.
Examples are :
- Libraries you installed to a different prefix with a different
- Libraries installed by the distro or libraries that are installed with
- Libraries in your
import libmay be provided by multiple alternative packages. For example,
tensorflowpackage both provide
import tensorflow. It's easy to forget if you've installed both.
As a result, always use
import lib to confirm after you uninstall something.
If you're surprised by the successful import, use methods in this article to tell where they are.
Be careful when declaring dependencies on large packages¶
Large, complicated dependencies such as OpenCV, PyTorch, TensorFlow often can be installed in many
different ways, only some of them are valid to certain environments.
Such dependencies should NOT be declared in
requirements.txt to be automatically installed.
To avoid invalid installation or multiple installation, the choice of how to install these
dependencies should be left with users.
Unfortunately, 10k+ projects declares
opencv-python as a dependency.
As a result, their users will automatically install and use the desktop version
opencv-python, instead of:
- the contrib version
opencv-python-contrib, with more features
- the headless version
opencv-python-headless, with fewer features and fewer compatibility issues
- Linux distro's own package, with fewer compatibility issues
opencv-python has given suggestions
on how to select the right package. "Automatic" selection is simply wrong.
Similarly, a project that declares dependency on PyTorch may automatically install one with mismatched CUDA version.
Be careful when using a library in the root of its source¶
You can sometimes have a python library installed already, but you also have its raw source code somewhere in your system. This is another potential case of multiple installation.
If you execute
import libA in the source directory, python may find a local directory
libA which contains the source code, and use this source code, rather than the
libA that's actually installed in a different location.
In addition to the common confusions that can arise from multiple installations,
such situation often cause errors, because source code is often an invalid installation itself.
In many libraries, the raw source code is different from what actually gets installed
after you run
The most common example is that compiled extensions will not exist in source code.
As a result, using a python library in its source code directory often leads to errors.
The issue is so common that some libraries try to detect and educate the users (e.g., numpy here and tensorflow here ) about it.
The situation where it is OK to use a source directory includes:
- Simple libraries where the source code is equal to what gets installed
- Libraries that can be, and have been installed locally inside the source directory, usually with
pip install --editable.
Never use sudo to install python packages¶
sudo pip install or
sudo python setup.py,
unless it's a virtual system (e.g. docker) that you don't intend to keep long.
- It is yet another installation. For example, you can have one version installed with root and one without, causing more trouble.
- When you do installation in the future in the right way (without root), this old package cannot be automatically upgraded.
- It affects all users, causing the "multiple installation" problem for them as well.
pip install --user can install libraries without root permission (installed to
$HOME/.local on linux).
This option is sometimes default in latest
Or you can use venv if stronger isolation is needed.
Now venv is officially part of Python 3.
You don't need root permission for most installation¶
You only need root permission when the library directly interacts with hardware. e.g., you need root permission to install nvidia driver.
You do not need root permission to, e.g., install a different version of Python, GCC, or CUDA (though a newer CUDA sometimes requires newer driver). But doing these without root permission certainly requires some extra knowledge.
Avoid mixing binaries built from different sources¶
Python itself is a binary, that depends on some other binary libraries. Each python package may also contain binaries or depend on other binary libraries. Mixing binaries built from different sources (e.g. your system package manager v.s. anaconda) together (i.e. into a single process) has potential binary compatibility issues.
Such issues can happen when you want to use
libB together, but
they are built using different versions of another library
or built with different C++ compilers.
(C compiler, however, should produce binary compatible code across compiler versions).
Ideally you might expect some mechanism to avoid such conflicts. There is indeed a compilcated set of symbol visibility & compiler ABI rules, but most libraries are not following them correctly. The result of such incompatibility issues is often a segfault or other mysterious errors.
In reality, here are how packages are built:
Your OS's package manager (
apt/yum/pacman, etc) installs many binaries and libs. They are built with the exact system packages they depend on, using the exact compiler installed by the package manager. They are all built in a nice uniform environment that will not have any compatibility issues: all these packages can be mixed together.
pip installa package, there are two possiblities:
Source distribution: this command compiles source code, using whatever compiler & dependency libraries it finds. So its compatibility will depend on which compiler & libraries it finds. Typically this is controlled by standard environment variables such as
$LIBRARY_PATH, but it varies among packages.
Binary wheel distribution: this command downloads a pre-built binary. This means that you need to confirm the binary is built in an environment that's compatible with other packages you're using.
Lots of binary packages on pypi contain the word "manylinux": it means the package is built such that it's supposed to be compatible with most linux environment. Typically, using a manylinux package should not lead to compatibility issues. Although there are exceptions (e.g., some packages incorrectly mark themselves as manylinux). Also, a manylinux package may have suboptimal performance due to the compatibility requirements: they are often built with old version of compilers and old instruction set.
For other packages without the "manylinux" signature, you can only wish for good luck. They usually work fine but could stop working at any day. There are a number of github issues in different projects about "import libA causes import libB to crash". Typically these are giant projects, such as OpenCV, TensorFlow, PyTorch.
conda installa package that contains binaries, it's always pre-built. The official packages are built in anaconda's standard environment, and all the runtime dependencies in that standard environment are also packaged and distributed by anaconda. Anaconda provides a (almost) full runtime environment: including essential libs such as
libgcc. This means that the conda world is just like your OS's package manager: if you use conda to install all libraries (and their dependencies), they are always compatible with each other.
That sounds nice, until you want to build a package by yourself. Anaconda provides a full runtime environment, but usually not the build-time environment. Normally you'll still be building the package using your system's compiler & libraries (or those defined by your envvars).
As long as you use
pythonfrom conda, you'll almost always run inside conda's runtime environment, using
libjpeg, etc from
anaconda/lib. It's then possible that the package you build is not compatible with conda's runtime environment.
I've frequently seen such failures, e.g.:
- Build a package using system's gcc. Then it cannot run inside conda's runtime since the runtime is built with an old version of gcc.
conda install cudatoolkit=10.1 pytorchgives you a working pytorch in cuda10.1 runtime. It works fine until you build a custom cuda extension: the extension will use
nvccfrom your system which may not be 10.1.
That's why I personally avoid conda and use system's python whenever possible.