Hello,
I would like to get the coverage for every single base in all regions in a bed file. For this I use samtools now
samtools depth -aa -b regions.bed input.bam
This all works as excpected. But the more regions I have the longer this tasks takes of course. I can speed it up by dividing the bed file and start parallel processes. But it still needs about 45min to complete. My whole analyse pipeline from fastq to an annotated vcf file runs only 1h for this data set.
So I wonder if there is a faster way for this task?
fin swimmer
Hi , If you have enough resources you can split your bam for example by chromosome and run samtools for each splited bam ?
Should there be any reasonable differences to split the bam file and not the bed file with the regions of interests?
I thought with the bam index it should be possible to have random access directly to the regions?
fin swimmer
The BBMap package has a tool "pileup.sh" which is quite fast, and produces per-base coverage from an unsorted sam or bam file. I'm not sure if this answers your question, though, as it's not clear to me whether you want bed as input or output, or need bed at all.
pileup.sh seems to need lot of memory which I have to little (around 6GB) :(
The bed file is just needed to truncate the bam file to the region of interest. If I can pipe to pileup I can do this before with samtools view. So can I? Didn't find it in the documentation.
fin swimmer
May be i didn't explain well my idea. You can split your bed and run multiple samtools in the same times. It will divide the calculation time in a lot little job ... Well you should try i m not sure about that but it make sens for me.
Hello Titus, this is what I meant with
:)
sorry my bad i read too fast :)
Pileup will only allocate memory for chromosomes that actually have reads mapped to them, so yes.