Dependency Management#
Warning
Experimental Feature - PyScaffold support for virtual environment management is experimental and might change in the future.
Foundations#
The greatest advantage in packaging Python code (when compared to other forms
of distributing programs and libraries) is that packages allow us to stand on
the shoulders of giants: you don’t need to implement everything by yourself,
you can just declare dependencies on third-party packages and setuptools
,
pip
, PyPI and their friends will do the heavy lifting for you.
Of course, with great power comes great responsibility. Package authors must be careful when declaring the versions of the packages they depend on, so the people consuming the final work can do reliable installations, without facing dependency hell. In the opensource community, two main strategies have emerged in the last few years:
the first one is called abstract and consists of having permissive, minimal and generic dependencies, with versions specified by ranges, so anyone can install the package without many conflicts, sharing and reusing as much as possible dependencies that are already installed or are also required by other packages
the second, called concrete, consists of having strict dependencies, with pinned versions, so all the users will have repeatable installations
Both approaches have advantages and disadvantages, and usually are used together in different phases of a project. As a rule of thumb, libraries tend to emphasize abstract dependencies (but can still have concrete dependencies for the development environment), while applications tend to rely on concrete dependencies (but can still have abstract dependencies specially if they are intended to be distributed via PyPI, e.g. command line tools and auxiliary WSGI apps/middleware to be mounted inside other domain-centric apps). For more information about this topic check Donald Stufft post.
Since PyScaffold aims the development of Python projects that can be easily
packaged and distributed using the standard PyPI and pip
flow, we adopt the
specification of abstract dependencies using setuptools
’ install_requires
. This
basically means that if PyScaffold generated projects specify dependencies
inside the setup.cfg
file (using general version ranges), everything will
work as expected.
Test Dependencies#
While specifying the final dependencies for packages is pretty much
straightforward (you just have to use install_requires
inside
setup.cfg
), dependencies for running the tests can be a little bit trick.
Historically, setuptools
provides a tests_require
field that follows
the same convention as install_requires
, however this field is not strictly
enforced, and setuptools
doesn’t really do much to enforce the packages
listed will be installed before the test suite runs.
PyScaffold’s recommendation is to create a testing
field (actually you can
name it whatever you want, but let’s be explicit!) inside the
[options.extras_require]
section of setup.cfg
. This way multiple test
runners can have a centralised configuration and authors can avoid double
bookkeeping.
If you use tox
(recommended), you can list testing
under the the extras
configuration field option
(PyScaffold template for tox.ini
already takes care of this configuration for you).
If running pytest
directly, you will have to install those dependencies
manually, or do a editable install of your package with
pip install -e .[testing]
.
Tip
If you prefer to use just tox
and keep everything inside
tox.ini
, please go ahead and move your test dependencies.
Every should work just fine :)
Note
PyScaffold strongly advocates the use of test runners to guarantee
your project is correctly packaged/works in isolated environments.
New projects will ship with a default tox.ini
file that is a good
starting point, with a few useful tasks. Run tox -av
to list all the
available tasks.
Basic Virtualenv#
As previously mentioned, PyScaffold will get you covered when specifying the
abstract or test dependencies of your package. We provide sensible
configurations for setuptools
and tox
out-of-the-box.
In most of the cases this is enough, since developers in the
Python community are used to rely on tools like virtualenv
and have a
workflow that take advantage of such configurations. As an example, you
could do:
$ pip install pyscaffold
$ putup myproj
$ cd myproj
$ virtualenv .venv
# OR python -m venv .venv
$ source .venv/bin/activate
$ pip install -U pip setuptools setuptools_scm tox
# ... edit setup.cfg to add dependencies ...
$ pip install -e .
$ tox
However, someone could argue that this process is pretty manual and laborious to maintain specially when the developer changes the abstract dependencies.
PyScaffold can alleviate this pain a little bit with the
venv
extension:
$ putup myproj --venv --venv-install PACKAGE
# Is equivalent of running:
#
# putup myproj
# cd myproj
# virtualenv .venv OR python -m venv .venv
# pip install PACKAGE
But it is still desirable to keep track of the version of each item in the dependency graph, so the developer can have environment reproducibility when trying to use another machine or discuss bugs with colleagues.
In the following sections, we describe how to use a few popular command line tools, supported by PyScaffold, to tackle these issues.
Tip
When called with the --venv
option, PyScaffold will try first to use
virtualenv
(there are some advantages on using it, such as being faster),
and if it is not installed, will fallback to Python stdlib’s venv
.
Please notice however that even venv
might not be available by default
in your system: some OS/distributions split Python’s stdlib in several
packages and require the user to explicitly install them (e.g. Ubuntu will
require you to do apt install python3-venv
). If you run into problems,
try installing virtualenv
and run the command again.
Integration with Pipenv#
We can think in Pipenv as a virtual environment manager. It creates
per-project virtualenvs and generates a Pipfile.lock
file that contains a
precise description of the dependency tree and enables re-creating the exact
same environment elsewhere.
Pipenv supports two different sets of dependencies: the default one, and the dev set. The default set is meant to store runtime dependencies while the dev set is meant to store dependencies that are used only during development.
This separation can be directly mapped to PyScaffold strategy: basically the
default set should mimic the install_requires
option in setup.cfg
,
while the dev set should contain things like tox
, sphinx
,
pre-commit
, ptpython
or any other tool the developer uses while
developing.
Tip
Test dependencies are internally managed by the test runner, so we don’t have to tell Pipenv about them.
The easiest way of doing so is to add a -e .
dependency (in resemblance
with the non-automated workflow) in the default set, and all the other ones in
the dev set. After using Pipenv, you should add both Pipfile
and
Pipfile.lock
to your git repository to achieve reproducibility (maintaining
a single Pipfile.lock
shared by all the developers in the same project can
save you some hours of sleep).
In a nutshell, PyScaffold+Pipenv workflow looks like:
$ pip install pyscaffold pipenv
$ putup myproj
$ cd myproj
# ... edit setup.cfg to add dependencies ...
$ pipenv install
$ pipenv install -e . # proxy setup.cfg install_requires
$ pipenv install --dev tox sphinx # etc
$ pipenv run tox # use `pipenv run` to access tools inside env
$ pipenv lock # to generate Pipfile.lock
$ git add Pipfile Pipfile.lock
After adding dependencies in setup.cfg
, you can run pipenv update
to
add them to your virtual environment.
Warning
Experimental Feature - Pipenv is still a young project that is moving very fast. Changes in the way developers can use it are expected in the near future, and therefore PyScaffold support might change as well.
Integration with pip-tools#
Contrary to Pipenv, pip-tools
does not replace entirely the aforementioned
“manual” workflow. Instead, it provides lower level command line tools that
can be integrated to it, in order to achieve better reproducibility.
The idea here is that you have two types files describing your dependencies:
*requirements.in
and *requirements.txt
. The .in
files are the ones
used to list abstract dependencies, while the .txt
files are
generated by running pip-compile
.
Again the easiest way of having the requirements.in
file to mimic
setup.cfg
’ install_requires
is to add something like -e .
to it.
Warning
For the time being adding -e file:.
is a working
solution that is tested by pip-tools
team (-e .
will generate absolute
file paths in the compiled file, which will make it impossible to share).
However this situation might change in the near future.
You can find more details about this topic and monitor any changes in
https://github.com/jazzband/pip-tools/issues/204.
When using -e file:.
in your requirements.in
file,
the compiled requirements.txt
needs to be installed via
pip-sync
instead of pip install -r requirements.txt
You can also create multiple environments and have multiple “profiles”, by using
different files, e.g. dev-requirements.in
or ci-requirements.in
,
but keeping it simple and using requirements.in
to represent all the tools
you need to run common tasks in a development environment is a good practice,
since you can omit the arguments when calling pip-compile
and pip-sync
.
After all, if you need to have a separated test environment you can use tox,
and the minimal dependencies of your packages are already listed in
setup.cfg
.
Note
The existence of a requirements.txt
file in the root of your repository
does not imply all the packages listed there will be considered direct
dependencies of your package. This was valid for older versions of
PyScaffold (≤ 3), but is no longer the case. If the file exists, it is
completely ignored by PyScaffold and setuptools.
A simple a PyScaffold + pip-tools
workflow looks like:
$ putup myproj --venv --venv-install pip-tools setuptools_scm && cd myproj
$ source .venv/bin/activate
# ... edit setup.cfg to add dependencies ...
$ echo '-e file:.' > requirements.in
$ echo -e 'tox\nsphinx\nptpython' >> requirements.in # etc
$ pip-compile
$ pip-sync
$ tox
# ... do some debugging/live experimentation running Python in the terminal
$ ptpython
$ git add *requirements.{in,txt}
After adding dependencies in setup.cfg
(or to requirements.in
),
you can run pip-compile && pip-sync
to add them to your virtual environment.
If you want to add a dependency to the dev environment only, you can also:
$ echo "mydep>=1.2,<=2" >> requirements.in && pip-compile && pip-sync
Warning
Experimental Feature - the methods described here for integrating pip-tools
and PyScaffold in a single workflow are tested to a certain degree and not
considered stable.
The usage of relative paths in the compiled requirements.txt
file is a
feature that have being several years in the making and still is under
discussion. As everything in Python’s packaging ecosystem right now,
the implementation, APIs and specs might change in the future so it is up to
the user to keep an eye on the official docs and use the logic explained
here to achieve the expected results with the most up-to-date API
pip-tools
have to offer.
The issue https://github.com/jazzband/pip-tools/issues/204 is worth following.
If you find that the procedure here no longer works, please open an issue on https://github.com/pyscaffold/pyscaffold/issues.
Integration with conda#
Conda is an open-source package manager very popular in the Python ecosystem that can be used as an alternative to pip. It is especially helpful when distributing packages that rely on compiled libraries (e.g. when you need to use some C code to achieve performance improvements) and uses Anaconda as its standard repository (the PyPI equivalent in the conda world).
The main advantage of conda compared to virtualenv
/venv
based tools is that it unifies
several different tools and has a deeper isolation than the pip package manager.
For instance conda allows you to create isolated environments by specifying also the
Python version and even system libraries like glibc
. In the pip ecosystem, one
needs a tool like pyenv to choose the Python version and the installation of
system libraries besides the current ones is not possible at all.
Note
Unfortunately, since conda environments are more complex and feature-rich
than the ones produced by virtualenv
/venv
based tools,
package installations usually take longer.
If all your dependencies are pure Python packages and you don’t need to use
any compiled libraries, virtualenv
/venv
might provide a faster dev
experience.
To use conda with a project setup generated by PyScaffold just:
Create a file
environment.yml
, e.g. like this example for data science projects. Note thatname: my_conda_env
defines the name of the environment. Also note that besides the conda dependencies you can still add pip-installable packages by adding- pip
as dependency and a section defining additional packages as well as the project setup itself:- pip: - -e . - other-pip-based-package
This will install your project as well as
other-pip-based-package
within the conda environment. Be careful though that some pip-based packages might not work perfectly within a conda environment but this concerns only certain packages that tamper with the environment itself like tox for instance. As a rule of thumb, always define a requirement as conda package if available and only resort to pip packages if not available as conda package.Create an environment based on this file with:
conda env create -f environment.yml
Tip
Mamba is a new and much faster drop-in replacement for conda. For large environments, conda often requires several minutes or hours to solve dependencies while mamba normally completes within seconds.
To create an environment with mamba, you can run the following command:
mamba env create -f environment.yml
Activate the environment with:
conda activate my_conda_env
You can read more about conda in the excellent guide written by WhiteBox.
Also checkout the PyScaffold’s dsproject extension that already comes with a proper environment.yml
.
Creating a conda package#
The process of creating conda packages consists basically in creating some extra files that describe general recipe to build your project in different operating systems. These recipe files can in theory coexist within the same repository as generated by PyScaffold.
While this approach is completely fine and works well, a package
uploaded by a regular user to Anaconda will not be available if someone simply try to
install it via conda install <pkg name>
.
This happens because Anaconda and conda are organised in terms of channels and regular
users cannot upload packages to the default channel.
Instead, separated personal channels need to be used for the upload and explicitly
selected with the -c <channel name>
option in conda install
.
It is important however to consider that mixing many channels together might create clashes in dependencies (although conda tries very hard to avoid clashes by using channel preference ordering and a clever resolution algorithm).
A general practice that emerged in the conda ecosystem is to organise packages in large communities that share a single and open repository in Anaconda, that rely on specific procedures and heavy continuous integration for publishing cohesive packages. These procedures, however, might involve creating a second repository (separated from the main code base) to just host the recipe files. For that reason, PyScaffold does not currently generate conda recipe files when creating new projects.
Instead, if you are an open-source developer and are interested in distributing packages via conda, our recommendation is to try publishing your package on conda-forge (unless you want to target a specific community such as bioconda). conda-forge is one of the largest channels in Anaconda and works as the central hub for the Python developers in the conda ecosystem.
Once you have your package published to PyPI using the project generated by PyScaffold, you can create a conda-forge feedstock 1 using a special tool called grayskull and following the documented instructions. Please make sure to check PyScaffold community tips in discussion #422. Also, there are useful tips in issue #633.
If you still need to use a personal custom channel in Anaconda, please checkout conda-build tutorials for further information.
Tip
It is not strictly necessary to publish your package to Anaconda for your
users to be able to install it if they are using conda –
pip install
can still be used from a conda environment.
However, if you have dependencies that are also published in Anaconda and
are not pure Python projects (e.g. numpy
or matplotlib
), or that
rely on virtual environments, it is generally advisable to do so.
- 1
feedstock is the term used by conda-forge for the companion repository with recipe files