Tutorial:Greatly speed up conda by using mamba
0
40
Entering edit mode
3.7 years ago

Conda has been my go-to software manager, but it was only recently brought to my attention that a well maintained project has aimed to greatly speed up and improve the conda functions. Mamba is an almost complete rewrite of conda in C++, allows for parallel downloading of data, and has faster (and better) dependency resolving. For working with larger environments I’ve noticed tasks that would take tens of minutes only takes minutes now (if not less).

Below I’ll provide a quick tutorial on working with mamba, which in theory mirrors what one would do with conda. You should have at least miniconda3 installed.

Installing Mamba

Your conda base environment is the environment where the conda software lives, and additionally any software in base gets added to your path. Generally you don’t want to install software into the base environment., however in this particular case you need to install mamba here, since it will replace conda for most tasks.

conda install -n base -c conda-forge mamba

I usually set the conda-forge and bioconda channels to allow for downloading up-to-date biology software.

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

If you ever want to update the mamba software you can now update it similar to how conda is updated.

mamba update -n base mamba

Managing Environments

Finding software

If you want to search for an available package, the command is slightly different in mamba.

mamba repoquery search samtools

You also have the ability to check which packages the one you are searching for depends on.

mamba repoquery depends -a samtools

Creating an environment

Installing an environment is similar to conda. As an example I’ll create an environment that includes common software for transcript quantification and differential gene expression.

mamba create -n rnaquant salmon r-tximport bioconductor-deseq2

The environment can now be activated and used with conda activate rnaquant.

Adding/Updating software

Once an environment is created, it’s easy to add new software. I will add FastQC to my RNA-seq quantification environment, because as a good bioinformatician I want to quality control my fastq files.

mamba install -n rnaquant fastqc

It’s also easy to update the software in an environment.

mamba update -n rnaquant --all

Undoing changes to an environment

Environments keep track of all software changes. You can look through these changes and revert back to a previous state using revisions.

First, search through the revision history of an environment.

mamba list -n rnaquant --revisions

Let’s say that updating the environment above put us at revision 2, but now we want to undo those updates and get back to the state we were in just after we installed fastqc (revision 1). You can install revisions like you can software.

mamba install -n rnaquant --revision 1

Removing an environment

You’ve done all your work for a project and now you don’t need it any more. You can safely remove the environment.

mamba env remove -n rnaquant

Remember that if you want to save an environment for use later, you can export a recipe to a yaml file.

mamba env export -n rnaquant > rnaquant_env.yaml

You can then use that yaml file to install the environment again. Note that currently to take advantage of mamba when using a yaml file an empty environment must first be created, and then this environment updated using the yaml file [source].

mamba env create -n rnaquant
mamba env update -n rnaquant -f naquant_env.yaml

Freeing Space

If you ever need to free up disc space, you can run mamba clean --all to remove tarballs and other files used when installing software. This can be handy in some situations, such as using conda/mamba to install software into a docker or singularity container. I've personally found this can clear more than a Gb of data for larger environment installs.

software • 26k views
ADD COMMENT
1
Entering edit mode

Neat cheatsheet rpolicastro —thanks for collecting all of these commands together!

The only gotcha that I was able to spot is in the last command, when using the YAML file to recreate the environment, it looks like it's missing an env component:

mamba env create --file rnaquant_env.yaml

Hope this helps future readers, and thanks again!

ADD REPLY
5
Entering edit mode

Agree that mamba is a big quality of life improvement.

It is my understanding that the command mamba env create --file rnaquant_env.yaml falls back on conda rather than using mamba. So executing it as above may fail to reproduce the environment.

An alternative is to create an empty environment, then execute mamba env update -n rnaquant --file rnaquan_env.yml as discussed on the Github issue for this topic.

ADD REPLY
1
Entering edit mode

Thanks for pointing this out. I've update the post!

ADD REPLY
1
Entering edit mode

Thanks, I fixed the command!

ADD REPLY
0
Entering edit mode

Thanks rpolicastro for the quick turnaround and for the quality content overall! 👍🏼

ADD REPLY
1
Entering edit mode

Can confirm, mamba is much faster than conda. Sometimes it even resolved dependencies when conda failed.

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6