Overview
Teaching: 15 min
Exercises: 15 minQuestions
- How can we communicate different versions of software dependencies?
Our codes often depend on other codes that in turn depend on other codes …
These tools try to solve the following problems:
Isolated environments are also useful because they help you make sure that you know your dependencies!
Exercise/discussion
Compare these four
requirements.txt
solutions:A:
Code depends on a number of packages but there is no
requirements.txt
file or equivalent.B:
scipy numpy sympy click git+https://github.com/someuser/someproject.git@master git+https://github.com/anotheruser/anotherproject.git@master
C:
scipy==1.3.1 numpy==1.16.4 sympy==1.4 click==7.0 git+https://github.com/someuser/someproject.git@d7b2c7e git+https://github.com/anotheruser/anotherproject.git@sometag
D:
scipy==1.3.1 numpy==1.16.4 sympy==1.4 click==7.0 someproject==1.2.3 anotherproject==2.3.4
$ pip install somepackage
$ pip install somepackage==1.2.3
requirements.txt
:
$ pip freeze > requirements.txt
requirements.txt
:
$ pip install -r requirements.txt
$ pip install git+https://github.com/anotheruser/anotherproject.git@sometag
$ conda install somepackage
$ conda install somepackage=1.2.3
$ conda create --name myenvironment
requirements.txt
:
$ conda create --name myenvironment --file requirements.txt
$ conda create --prefix /some/path/to/env
$ conda activate myenvironment
$ conda deactivate
$ conda info -e
requirements.txt
:
$ conda list --export > requirements.txt
environment.yml
:
$ conda env export > environment.yml
$ conda clean
Conda packages can be built from a recipe and shared on anaconda.org via your own private or public channel, or via conda-forge.
A step-by-step guide on how to contribute packages can be found in the conda-forge documentation.
To get an idea of what’s needed, let’s have a look at the boost feedstock (a set of C++ libraries). We see that:
recipe/
directory, along with (optional) build.sh
and bld.bat
files for building
non-python code on OSX/Linux and Windows platforms.Exercise: Creating and exporting conda environments
On Windows, we recommend to do this exercise in the Anaconda Prompt.
Begin by first importing (by clicking “Use this template”) and then cloning the word-count project. Then recreate the software environment provided by the
requirements.txt
file in the repository. Thesnakemake-minimal
package is only available in thebioconda
channel, which needs to be specified:$ conda create --name wordcount --file requirements.txt --channel bioconda --channel conda-forge
This will download all the packages listed in the
requirements.txt
file (with matching versions) along with all dependencies. The new environment also needs to be activated:$ conda activate wordcount
We now have (roughly) the same environment as specified by the developers of the word-count project. But let’s say we want to extend this environment, and share it with colleagues:
- Inspect your available environments with
conda info -e
.- Install the
pandas
package usingconda install
.- Add
pandas
with the version you installed to therequirements.txt
file.- Export the full environment using
conda env export > full_env.yml
, and compare the.yml
file format to the.txt
file format.- Create a new environment.yml file with the packages from the requirements.txt file. You only need the
channels
anddependencies
sections, and dependencies can be listed aspackage=1.2.3
or simplypackage
.- If you want to make sure your new environment.yml is correct, you can use it to create a new test environment using
conda env create -n <name> -f <file.yml>
. You can then delete it withconda env remove <envname>
or simply remove the directory of the environment (that you can find usingconda info -e
).
# creating a new env
$ virtualenv myenvironment
$ virtualenv --python=python3 myenvironment
$ virtualenv /path/to/myenvironment
# activating env, installing package and deactivating
$ source myenvironment/bin/activate
$ pip install somepackage
$ deactivate
Good overview of use cases, strategies and tools for reproducible environment at Reproducible Environments.
There are many tools available:
There are no standard methods or tools to handle dependencies in C/C++, but useful tools include:
Key Points
Capturing software dependencies is a must for reproducibility.
Files like
requirements.txt
,environment.yml
,Pipenv
, …, should be part of the source repository.Be skeptical when you see dependency lists without versions.