7.1 Managing Software
Teaching: 60 min || Exercises: 20 min
Overview
7.1 The conda
Package Manager
Often you may want to use software packages that are not be installed by default on your computer or the compute cluster you may have access to. There are several ways you could manage your own software installation, but in this module we will be using Conda
, which gives you access to a large number of scientific packages.
There are two main software distributions that you can download and install, called Anaconda
and Miniconda
.
Miniconda
is a lighter version, which includes only base Python, while Anaconda
is a much larger bundle of software that includes many other packages (see the Documentation Page for more information).
One of the strengths of using Conda
to manage your software is that you can have different versions of your software installed alongside each other, organised in environments. Organising software packages into environments is extremely useful, as it allows to have a reproducible set of software versions that you can use and resuse in your projects.
Installing Conda
(Optional)
To start with, let’s install Conda
on your machine. In this course we will install the Miniconda
bundle, as it’s lighter and faster to install:
- Make sure you’re logged in to the HPC and in the home directory (
cd ~
). - Download the
Miniconda
installer by running:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
- Run the installation script just downloaded:
bash Miniconda3-latest-Linux-x86_64.sh
- Follow the installation instructions accepting default options (answering ‘yes’ to any questions)
- Run the following command:
conda config --add channels defaults; conda config --add channels bioconda; conda config --add channels conda-forge; conda config --set channel_priority strict
This adds two channels (sources of software) useful for bioinformatics and data science applications.
Installing Software Using Conda
The command used to install and manage software is called conda
. Although we will only cover the basics in this course, it has an excellent documentation and a useful cheatsheet.
The first thing to do is to create a software environment for our project. Although this is optional (you could instead install everything in the “base” default environment), it is a good practice as it means the software versions remain stable within each project.
To create an environment we use:
conda create --name ENV
Where “ENV” is the name we want to give to that environment. Once the environment is created, we can install packages using:
conda install --name ENV PROGRAM
Where “PROGRAM” is the name of the software we want to install.
To search for the software packages that are available through conda
:
- go to anaconda.org.
- in the search box search for a program of your choice. For example: “bowtie2”.
- the results should be listed as
CHANNEL/PROGRAM
, where CHANNEL will the the source channel from where the software is available. Usually scientific/bioinformatics software is available through theconda-forge
andbioconda
channels.
If you need to install a program from a different channel than the defaults, you can specify it during the install command using the -c
option. For example conda install --channel CHANNEL --name ENV PROGRAM
.
Let’s see this with an example, where we create a new environment called “scipy” and install the python scientific packages:
conda create --name scipy
conda install --name scipy --channel conda-forge numpy matplotlib
To see all the environments you have available, you can use:
conda info --env
# conda environments:
#
base * /home/participant36/miniconda3
scipy /home/participant36/miniconda3/envs/scipy
In our case it lists the base (default) environment and the newly created scipy environment. The asterisk (“*“) tells us which environment we’re using at the moment.
Loading Conda
Environments
Once your packages are installed in an environment, you can load that environment by using conda activate ENV
, where “ENV” is the name of your environment. For example, we can activate our previously created environment with:
conda activate scipy
If you check which python
executable is being used now, you will notice it’s the one from this new environment:
which python
~/miniconda3/envs/scipy/bin/python
You can also check that the new environment is in use from:
conda env list
# conda environments:
#
base /home/participant36/miniconda3
scipy * /home/participant36/miniconda3/envs/scipy
And notice that the asterisk “*” is now showing we’re using the scipy
environment.
7.2 Replacing conda with Mamba
conda
is an amazing tool which has completely changed the way we install tools, removing the worst of the hassle around ensuring that dependencies are installed alongside the tools. However, conda
can be quite slow and sometimes has difficulties resolving the dependencies. Fortunately, Mamba
was developed to speed up the process. Mamba
is a reimplementation of the conda
package manager in C++ and allows you to use exactly the same commands as conda (simply replace conda
with mamba
). Unlike, the tools we’ve been installing above in their own environments, Mamba
needs to be installed in the conda
base
environment. Let’s try installing it now:
conda install mamba -n base -c conda-forge
Once it’s installed let’s try creating a new environment called bakta
and then install the Bakta
annotation tool we’re going to use in the Assembly and annotation module.
mamba create -n bakta
mamba activate bakta
mamba install -c bioconda bakta
Let’s check that we successfully installed bakta:
bakta -h
If you get the bakta help output, congratulations you’ve successfully installed a tool with Mamba
.
You can see the commands we use with Mamba
are exactly the same as conda
but the output looks very different. Don’t worry it’s doing exactly the same thing as conda
but better!
Further resources
- Search for
Conda
packages at anaconda.org - Learn more about
Conda
from the Conda User Guide - Conda Cheatsheet (PDF)
Credit
Information on this page has been adapted and modified from the following source(s): https://github.com/cambiotraining/hpc-intro-sanger/blob/main/04-software.md