samtools mpileup taking too long
0
0
Entering edit mode
8 months ago
K • 0

I am trying to generate a bcf file through samtools mpileup but it taking too long, how long does it usually take to generate a bcf file?

My bam file is 26.9 gb, and the sorted bam file is 17.6 gb.

Bam samtools sam alignment • 989 views
ADD COMMENT
1
Entering edit mode

It can be slow, but I don't have an exact time. It's easy for you to get this though and it'd be more accurate than anything people here can give you as hardware differs. Index it and time it on a smaller region. Extrapolate from there.

A few pointers though.

  • Don't use samtools mpileup to make bcfs. It's ancient. Use bcftools mpileup instead.
  • Use a modern bcftools, particularly if you have long-read technologies. (There have been substantial changes in develop branch too for accuracy.) Also specify the instrument profile with -X. This can dramatically improve indel calling recall and precision.
  • Mpileup is an embarassingly parallel problem. There isn't a built in wrapper than I know of, but it's not hard to do yourself. Just produce a list of regions (could be 1 per chromosome, but better to do smaller fragments) and run bcftools with gnu parallel to process each region and spit out a new file. You can then just bcftools concat these together in order to get the final mpileup.
  • Also see http://www.htslib.org/workflow/wgs-call.html, although I think some of that is rather legacy (eg I'm not sure if IndelRealigner still exists, or whether it's still even necessary any more).
ADD REPLY
0
Entering edit mode

The size shouldn't differ between sorted and unsorted files.

My bam file is 26.9 gb

Maybe this is sam file? and what you're refering as sorted bam is bam file. But, 30Gb sam should be compressed to a smaller file than 18Gb.

I suggest you post the commands you used as well as the what you're sequencing.

ADD REPLY
1
Entering edit mode

The size shouldn't differ between sorted and unsorted files.

Not quite. Sorted BAM files may have similar elements closer so they compress better. e.g. Size of BAM file reduces after sorting with samtools

ADD REPLY
0
Entering edit mode

Huh, does this happens with merged bams?

Now I think about it, I never written down an unsorted bam to notice the difference between sorted/unsorted file sizes and just assumed they would be similiar.

Bad assumtion :/, thanks for correcting.

ADD REPLY
0
Entering edit mode

My sam file is 126 gb

ADD REPLY

Login before adding your answer.

Traffic: 1906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6