Setup

Workshop Attendees

If you are attending one of our workshops, we will provide a training environment with all of the required software and data.
If you want to setup your own computer to run the analysis demonstrated on this course, you can follow the instructions below.

Note that we use tabsets to provide instructions for all three major operating systems. However, as much as possible we advice you use a Linux system, as our training environment is built on that.

Installing conda

We will perform a fresh installation of the conda package using the miniconda installation option.

Note

If you already have Miniconda or Anaconda installed, and you just want to upgrade, you should not proceed to making a fresh installation. Just use conda update to update your existing version of conda.

conda update conda

After updatiing conda, you can proceed to the instructions from number 8 to install mamba into the base environment from the conda-forge channel.

Follow this link to install miniconda and this link to install mamba on your windows system.

Open a terminal and follow the following instructions:

  1. Navigate to your home directory:
cd ~
  1. Download the Miniconda3 installer for mac by running:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
M processor users

For M1 processor users, you will need to run the below command:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
  1. Run the installation script just downloaded:
bash Miniconda3-latest-MacOSX-x86_64.sh
  1. Follow the installation instructions accepting default options (answering ‘yes’ to any questions)
  • If you are unsure about any setting, accept the defaults. You can change them later.
  1. To make the changes take effect, close and then re-open your terminal window.

  2. Test your installation.

  • In your terminal window, run the command conda list:
conda list
  • A list of installed packages appears if it has been installed correctly.
  1. Remove the installation script as it is no longer needed if successfully installed:
rm Miniconda3-latest-MacOSX-x86_64.sh
  1. Run the following command to add channels:
conda config --add channels defaults; conda config --add channels bioconda; conda config --add channels conda-forge; conda config --set channel_priority strict

This adds two channels (sources of software) useful for bioinformatics and data science applications.

  1. Install Mamba into the base environment from the conda-forge channel with the below command:
conda install mamba -n base -c conda-forge
  1. Run this to initiate mamba:
mamba init

Open a terminal and follow the following instructions:

  1. Navigate to your home directory:
cd ~
  1. Download the Miniconda3 installer for Linux by running:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  1. Run the installation script just downloaded:
bash Miniconda3-latest-Linux-x86_64.sh
  1. Follow the installation instructions accepting default options (answering ‘yes’ to any questions)
  • If you are unsure about any setting, accept the defaults. You can change them later.
  1. To make the changes take effect, close and then re-open your terminal window.

  2. Test your installation.

  • In your terminal window, run the command conda list:
conda list
  • A list of installed packages appears if it has been installed correctly.
  1. Remove the installation script as it is no longer needed if successfully installed:
rm Miniconda3-latest-Linux-x86_64.sh
  1. Run the following command to add channels:
conda config --add channels defaults; conda config --add channels bioconda; conda config --add channels conda-forge; conda config --set channel_priority strict

This adds two channels (sources of software) useful for bioinformatics and data science applications.

  1. Install Mamba into the base environment from the conda-forge channel with the below command:
conda install mamba -n base -c conda-forge
  1. Run this to initiate mamba:
mamba init

Creating conda environments for the workshop

Content will soon be uploaded.

creating the qc environment and installing required packages

Open a terminal, make sure you are in the conda base environment and run this command to install all required packages and their dependencies:

mamba create -n qc fastq-scan=1.0.0 fastqc=0.11.9 fastp=0.23.2 kraken2=2.1.2 bracken=2.7 multiqc=1.13a

This creates an environment called qc with the specified package versions and their dependencies.

NB. The tools fastq-scan and bracken runs python scripts which require python libraries pandas, json, glob.
Use the below command to install the packages in the qc environment:

conda install pandas=1.4.3 -n qc -c conda-forge 

We will activate and use this environment in chapter 4Sequencing Quality Control.

creating the mapping environment and installing required packages

run this command to install all required packages and their dependencies:

mamba create -n mapping bwa=0.7.17 samtools=1.15 bcftools=1.14 pysam=0.16.0.1 biopython=1.78

This creates an environment called mapping with the specified package versions and their dependencies.

We will activate and use this environment in chapter 5Short Read Mapping.

Installing required packages for Assembly and Annotation

NB. For the Assembly and Annotation module, we will create three different environments because there are conflicts in the conda recipes and it’ll be tricky to get all the tools working in a single environment.

We will thus, create each environment seperately with the following names:

mamba create -n shovill -c bioconda shovill=1.1.0 

mamba create -n quast -c bioconda quast=5.2.0 

mamba create -n bakta -c bioconda bakta=1.6.1

We will activate and use these environments in Chapter 6Assembly and Annotation.

creating the phylogenetics environment and installing required packages

run this command to install all required packages and their dependencies:

mamba create -n phylogenetics -c bioconda iqtree=2.2.0.3 snp-sites=2.5.1

This creates an environment called phylogenetics with the specified package versions and their dependencies.

We will activate and use this environment in Chapter 10Introduction to Phylogenetics.

creating the genotyping environments and installing required packages

NB. For the genotyping and AMR prediction, we will create five different environments because some tools require specific versions of python and other related packages hence we cannot install all the packages in a single environment.

We will thus, create each environment seperately with the following names:

  1. mlst
  2. seroba
  3. spoligotyping
  4. tbprofiler
  5. ariba

run the following commands to create the specified environment and install all required packages and their dependencies for:

mlst:

mamba create -n mlst mlst=2.22.1

seroba:

mamba create -n seroba seroba=1.0.2

spoligotyping:

mamba create -n spoligotyping spotyping=2.1

tbprofiler:

mamba create -n tbprofiler tb-profiler=4.1.1

ariba:

mamba create -n ariba ariba=2.14.6

These create the specified environment names mlst, seroba, spoligotyping, tbprofiler and ariba with the specified package versions and their dependencies.

We will activate and use these environments in chapter 11Bacterial Genotyping and Drug Resistance Prediction.

Specify version of toool to install

As you may see, all the tools installed have specified version numbers added to the tool names in the format tool=version_numer. This allows us to install the exact version of tools used for the training.

For your personal use, if you wish to use the latest version of these tools, just omit specifying the version z-version_number` and the latest version of the tool will hopefully be installed.

creating the qc environment and installing required packages

Open a terminal, make sure you are in the conda base environment and run this command to install all required packages and their dependencies:

mamba create -n qc fastq-scan=1.0.0 fastqc=0.11.9 fastp=0.23.2 kraken2=2.1.2 bracken=2.7 multiqc=1.13a

This creates an environment called qc with the specified package versions and their dependencies.

NB. The tools fastq-scan and bracken runs python scripts which require python libraries pandas, json, glob.
Use the below command to install the packages in the qc environment:

conda install pandas -n qc -c conda-forge 

We will activate and use this environment in chapter 4Sequencing Quality Control.

creating the mapping environment and installing required packages

run this command to install all required packages and their dependencies:

mamba create -n mapping bwa=0.7.17 samtools=1.15 bcftools=1.14 pysam=0.16.0.1 biopython=1.78

This creates an environment called mapping with the specified package versions and their dependencies.

NB. Creating the pseudogenomes step runs python scripts which require some python libraries.
Use the below command to install the packages in the mapping environment:

conda install pandas -n qc -c conda-forge 

We will activate and use this environment in chapter 5Short Read Mapping.

creating the genotyping environment and installing required packages

NB. For the genotyping and AMR prediction, we will create five different environments because some tools require specific versions of python and other related packages hence we cannot install all the packages in a single environment.

We will thus, create each environment seperately with the following names:

  1. mlst
  2. seroba
  3. spoligotyping
  4. tbprofiler
  5. ariba

run the following commands to create the specified environment and install all required packages and their dependencies for:

mlst:

mamba create -n mlst mlst=2.22.1

seroba:

mamba create -n seroba seroba=1.0.2

spoligotyping:

mamba create -n spoligotyping spotyping=2.1

tbprofiler:

mamba create -n tbprofiler tb-profiler=4.1.1

ariba:

mamba create -n ariba ariba=2.14.6

These create the specified environment names mlst, seroba, spoligotyping, tbprofiler and ariba with the specified package versions and their dependencies.

We will activate and use these environments in chapter 11Bacterial Genotyping and Drug Resistance Prediction.

Specify version of toool to install

As you may see, all the tools installed have specified version numbers added to the tool names in the format tool=version_numer. This allows us to install the exact version of tools used for the training.

For your personal use, if you wish to use the latest version of these tools, just omit specifying the version z-version_number` and the latest version of the tool will hopefully be installed.

Downloading databases

minikraken2 database

Download the kracken database “minikraken2_v1_8GB” into the database directory:

wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/old/minikraken2_v1_8GB_201904.tgz

Uncompress the database

tar xvfz minikraken2_v1_8GB_201904.tgz 

If the unzipped database is not same as the one use in the workshop, rename the it to match the workshop codes used using:

mv <unzipped_database_name> minikraken2_v1_8GB

You can now remove the zipped downloaded database as it is no longer required

rm minikraken2_v1_8GB_201904.tgz
bakta database

Download the Bakta database “db.tar.gz” into the database directory and unzip.

one step

If you have the denove_assembly environment activated, you can perform this step.

bakta_db download --output <output-path> 

If you use this option, you don’t need to perform the AMRFinderPlus step as the AMR-DB will be included automatically.

wget https://bakta-db.s3.computational.bio.uni-giessen.de/db.tar.gz

or

wget https://zenodo.org/record/7025248/files/db.tar.gz

Uncompress the database

tar -xzf db.tar.gz

Rename the database to match the workshop codes used

mv db bakta_db

Delete zipped file after unzipping

rm db.tar.gz

Download the AMRFinderPlus database

amrfinder_update --force_update --database bakta_db/amrfinderplus-db/

Updating an existing bakta database

bakta_db update --db <existing-db-path> [--tmp-dir <tmp-directory>]
seroba database

For git users, navigate to your database directory and clone the git repository:

git clone https://github.com/sanger-pathogens/seroba.git

Copy the database from the seroba/ to your database directory — this should be your current directory:

cp -r seroba/database .

Delete the git repository to clean up your system:

rm -r seroba

Still in your database directory, rename the database to match the workshop codes used:

mv database seroba_db

Nextflow

Singularity

You can use Singularity from the Windows Subsystem for Linux (see @wsl).
Once you setup WSL, you can follow the instructions for Linux.

Singularity is not available for Mac OS.

These instructions are for Ubuntu or Debian-based distributions1.

sudo apt update && sudo apt upgrade && sudo apt install runc
CODENAME=$(lsb_release -c | sed 's/Codename:\t//')
wget -O singularity.deb https://github.com/sylabs/singularity/releases/download/v3.10.2/singularity-ce_3.10.2-${CODENAME}_amd64.deb
sudo dpkg -i singularity.deb
rm singularity.deb

Visual Studio Code

  • Go to the Visual Studio Code download page and download the installer for your operating system. Double-click the downloaded file to install the software, accepting all the default options.
  • After completing the installation, go to your Windows Menu, search for “Visual Studio Code” and launch the application.
  • Go to “File > Preferences > Settings”, then select “Text Editor > Files” on the drop-down menu on the left. Scroll down to the section named “EOL” and choose “\n” (this will ensure that the files you edit on Windows are compatible with the Linux operating system).
  • Go to the Visual Studio Code download page and download the installer for Mac.
  • Go to the Downloads folder and double-click the file you just downloaded to extract the application. Drag-and-drop the “Visual Studio Code” file to your “Applications” folder.
  • You can now open the installed application to check that it was installed successfully (the first time you launch the application you will get a warning that this is an application downloaded from the internet - you can go ahead and click “Open”).

R and RStudio

Download and install all these using default options:

Download and install all these using default options:

  • Go to the R installation folder and look at the instructions for your distribution.
  • Download the RStudio installer for your distribution and install it using your package manager.

Workshop Data

Footnotes

  1. See the Singularity documentation page for other distributions.↩︎