Faster way to get coverage per base
3
3
Entering edit mode
7.6 years ago

Hello,

I would like to get the coverage for every single base in all regions in a bed file. For this I use samtools now

samtools depth -aa -b regions.bed input.bam

This all works as excpected. But the more regions I have the longer this tasks takes of course. I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete. My whole analyse pipeline from fastq to an annotated vcf file runs only 1h for this data set.

So I wonder if there is a faster way for this task?

fin swimmer

alignment sam • 5.3k views
ADD COMMENT
0
Entering edit mode

Hi , If you have enough resources you can split your bam for example by chromosome and run samtools for each splited bam ?

ADD REPLY
0
Entering edit mode

Should there be any reasonable differences to split the bam file and not the bed file with the regions of interests?

I thought with the bam index it should be possible to have random access directly to the regions?

fin swimmer

ADD REPLY
0
Entering edit mode

The BBMap package has a tool "pileup.sh" which is quite fast, and produces per-base coverage from an unsorted sam or bam file. I'm not sure if this answers your question, though, as it's not clear to me whether you want bed as input or output, or need bed at all.

ADD REPLY
0
Entering edit mode

pileup.sh seems to need lot of memory which I have to little (around 6GB) :(

The bed file is just needed to truncate the bam file to the region of interest. If I can pipe to pileup I can do this before with samtools view. So can I? Didn't find it in the documentation.

fin swimmer

ADD REPLY
0
Entering edit mode

May be i didn't explain well my idea. You can split your bed and run multiple samtools in the same times. It will divide the calculation time in a lot little job ... Well you should try i m not sure about that but it make sens for me.

ADD REPLY
0
Entering edit mode

Hello Titus, this is what I meant with

I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete.

:)

ADD REPLY
0
Entering edit mode

sorry my bad i read too fast :)

ADD REPLY
0
Entering edit mode

Pileup will only allocate memory for chromosomes that actually have reads mapped to them, so yes.

ADD REPLY
4
Entering edit mode
7.6 years ago
Jeffin Rockey ★ 1.3k

Try bedtools coverage with -d option (link)

(And do notice the first note on -sorted)

ADD COMMENT
0
Entering edit mode

I'used bedtools before, but put it aside because of high memory usage (I just have have 6GB). But with the -sorted parameter it seemes to work at least with my small test dataset. I will will check it today on my large dataset.

Thank you.

fin swimmer

ADD REPLY
1
Entering edit mode

Could test it yesterday and it works with the -sorted option much faster than with samtools.

Thanks a lot.

fin swimmer

ADD REPLY
3
Entering edit mode
5.9 years ago

To add in up-to-date solution:

mosdepth is great:

$ mosdepth --by capture.bed sample-output sample.exome.bam

fin swimmer

ADD COMMENT
1
Entering edit mode
7.6 years ago
Vivek Todur ▴ 60

Sambamba is high performance tool, similar to samtools. Here depth is multi threaded, you can take advantage of it...

Cheers...!

ADD COMMENT
0
Entering edit mode

Never heard about sambamba before, but I will try it.

fin swimmer

ADD REPLY
0
Entering edit mode

Did you get chance to try sambamba..? BTW latest version of samtools (1.4) got released with multi thread option... Just check out out...

ADD REPLY
0
Entering edit mode

I didn't try sambamba until now.

But I recognize that picards CollectHsMetrics have an PER_BASE_COVERAGE option. As I'm running CollectHsMetrics in any case I could use this option.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 2087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6