Bowtie2 mapping is slow
1
0
Entering edit mode
7 months ago

Hi,

I am using bowtie2 on HPC for mapping reads to around 4 million contigs. But it seems extremely slow. Is there any option to make it faster?

Here is parts of my script:

#!/bin/bash
#SBATCH --job-name="bt26"
#SBATCH --partition=Orion
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=36
#SBATCH --time=30-00:00:00

# Run Bowtie2 for paired-end reads
bowtie2 -x "$index_prefix" \
  -1 "$reads1" \
  -2 "$reads2" \
  -S "$output_dir/${sample}.sam";done

Thanks!!

metagenomics bowtie2 mapping read • 885 views
ADD COMMENT
0
Entering edit mode

If you've got a related genome you might be able to scaffold those 4m contigs to reduce them down a bit. Probably many of them only have exons and not full genes on them so utility will be poor. This tool might be useful - https://github.com/malonge/RagTag

In general though you might be able to find long read data for this genotype which will create a 100X better assembly at least .... if not, then maybe next time...

ADD REPLY
3
Entering edit mode
7 months ago
GenoMax 147k

You need to use -p option specify the number of threads that match the cores you are asking in your bowtie2 command line. You will also want to ask for more memory explicitly, if the default allocation on your cluster is low (using #SBATCH --mem=NNg option.

4 million contigs

Yikes! that is a fragmented assembly.

There is no need to make SAM file unless you have a specific reason. Pipe directly into samtools to make a sorted/indexed BAM file.

bowtie2 -x "$index_prefix"  -1 "$reads1"  -2 "$reads2" | samtools sort --write-index -o sorted.bam - 
ADD COMMENT
0
Entering edit mode

Thank you!! It worked like a charm

ADD REPLY
1
Entering edit mode

You might want to run seff on your job ID after it has finished, some SLURM configurations require you to use some variant of srun before your command, and without srun the job won't use all the CPUs that you've requested leading to slow jobs. But that might not be the case on your cluster. seff will tell you what percentage of your required CPUs was actually used.

ADD REPLY

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6