7.1 Managing Software
Teaching: 60 min || Exercises: 20 min
Overview
7.1 The conda Package Manager
Often you may want to use software packages that are not be installed by default on your computer or the compute cluster you may have access to. There are several ways you could manage your own software installation, but in this module we will be using Conda, which gives you access to a large number of scientific packages.
There are two main software distributions that you can download and install, called Anaconda and Miniconda.
Miniconda is a lighter version, which includes only base Python, while Anaconda is a much larger bundle of software that includes many other packages (see the Documentation Page for more information).
One of the strengths of using Conda to manage your software is that you can have different versions of your software installed alongside each other, organised in environments. Organising software packages into environments is extremely useful, as it allows to have a reproducible set of software versions that you can use and resuse in your projects.
Conda environments.Installing Conda (Optional)
To start with, let’s install Conda on your machine. In this course we will install the Miniconda bundle, as it’s lighter and faster to install:
- Make sure you’re logged in to the HPC and in the home directory (
cd ~). - Download the
Minicondainstaller by running:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh- Run the installation script just downloaded:
bash Miniconda3-latest-Linux-x86_64.sh- Follow the installation instructions accepting default options (answering ‘yes’ to any questions)
- Run the following command:
conda config --add channels defaults; conda config --add channels bioconda; conda config --add channels conda-forge; conda config --set channel_priority strictThis adds two channels (sources of software) useful for bioinformatics and data science applications.
Anaconda and Miniconda are also available for Windows and Mac OS. See the Conda Installation Documents for instructions.
Installing Software Using Conda
The command used to install and manage software is called conda. Although we will only cover the basics in this course, it has an excellent documentation and a useful cheatsheet.
The first thing to do is to create a software environment for our project. Although this is optional (you could instead install everything in the “base” default environment), it is a good practice as it means the software versions remain stable within each project.
To create an environment we use:
conda create --name ENVWhere “ENV” is the name we want to give to that environment. Once the environment is created, we can install packages using:
conda install --name ENV PROGRAMWhere “PROGRAM” is the name of the software we want to install.
One way to organise your software environments is to create an environment for each kind of analysis that you might be doing regularly. For example, you could have an environment named bioinformatics with software that you use for analysing sequence data (e.g. fastQC, bwa) and another called processing with software you use for downstream analysis of your results (e.g. Python’s pandas). Alternatively, you can create a separate environment for each tool you’d like to install and use.
To search for the software packages that are available through conda:
- go to anaconda.org.
- in the search box search for a program of your choice. For example: “bowtie2”.
- the results should be listed as
CHANNEL/PROGRAM, where CHANNEL will the the source channel from where the software is available. Usually scientific/bioinformatics software is available through theconda-forgeandbiocondachannels.
If you need to install a program from a different channel than the defaults, you can specify it during the install command using the -c option. For example conda install --channel CHANNEL --name ENV PROGRAM.
Let’s see this with an example, where we create a new environment called “scipy” and install the python scientific packages:
conda create --name scipy
conda install --name scipy --channel conda-forge numpy matplotlibTo see all the environments you have available, you can use:
conda info --env# conda environments:
#
base * /home/participant36/miniconda3
scipy /home/participant36/miniconda3/envs/scipy
In our case it lists the base (default) environment and the newly created scipy environment. The asterisk (“*“) tells us which environment we’re using at the moment.
Loading Conda Environments
Once your packages are installed in an environment, you can load that environment by using conda activate ENV, where “ENV” is the name of your environment. For example, we can activate our previously created environment with:
conda activate scipyIf you check which python executable is being used now, you will notice it’s the one from this new environment:
which python~/miniconda3/envs/scipy/bin/pythonYou can also check that the new environment is in use from:
conda env list# conda environments:
#
base /home/participant36/miniconda3
scipy * /home/participant36/miniconda3/envs/scipy
And notice that the asterisk “*” is now showing we’re using the scipy environment.
7.2 Replacing conda with Mamba
conda is an amazing tool which has completely changed the way we install tools, removing the worst of the hassle around ensuring that dependencies are installed alongside the tools. However, conda can be quite slow and sometimes has difficulties resolving the dependencies. Fortunately, Mamba was developed to speed up the process. Mamba is a reimplementation of the conda package manager in C++ and allows you to use exactly the same commands as conda (simply replace conda with mamba). Unlike, the tools we’ve been installing above in their own environments, Mamba needs to be installed in the conda base environment. Let’s try installing it now:
conda install mamba -n base -c conda-forgeOnce it’s installed let’s try creating a new environment called bakta and then install the Bakta annotation tool we’re going to use in the Assembly and annotation module.
mamba create -n bakta
mamba activate bakta
mamba install -c bioconda baktaLet’s check that we successfully installed bakta:
bakta -hIf you get the bakta help output, congratulations you’ve successfully installed a tool with Mamba.
You can see the commands we use with Mamba are exactly the same as conda but the output looks very different. Don’t worry it’s doing exactly the same thing as conda but better!
Further resources
- Search for
Condapackages at anaconda.org - Learn more about
Condafrom the Conda User Guide - Conda Cheatsheet (PDF)
Credit
Information on this page has been adapted and modified from the following source(s): https://github.com/cambiotraining/hpc-intro-sanger/blob/main/04-software.md
