Samtools - huge mpileup file!
1
0
Entering edit mode
9.8 years ago
lcc1844 ▴ 40

I have a .bam file which contains exome capture data and is 10GB.

I used the following command to make an mpileup:

samtools mpileup -E -uf hg19.fa file.art.bam > file.mpileup

It took several hours of making the mpileup file and it got to over 100GB in size and my computer ran out of storage! So I obviously stopped this command running to start again. Why was the pileup so large?

My intention upon getting the mpileup file was to do variant calling using:

bcftools view -cg file.mpileup > file.vcf

Are these the right options for me?

Thank you!

alignment next-gen • 5.9k views
ADD COMMENT
3
Entering edit mode

Use pipe...

ADD REPLY
1
Entering edit mode

We're in the pipe five by five:

ADD REPLY
0
Entering edit mode

I stopped unzipping the file because my computer ran out of space. I must have done something wrong somewhere because the .vcf should surely be much smaller than fastq and bam!

ADD REPLY
0
Entering edit mode

Pipes are your friend: zcat file.vcf.gz | less.

ADD REPLY
3
Entering edit mode
9.8 years ago

A pileup file is text-based, so it's going to be large. Assuming you have the most recent version of samtools, just use samtools mpileup -Euvf hg19.fa file.art.bam | bgzip > file.vcf.gz.

ADD COMMENT
0
Entering edit mode

Thanks very much, I ran the command you suggested and it took many hours and generated a zipped .vcf file that is 21GB. Is this not larger that what I could expect from one human exome?

ADD REPLY
0
Entering edit mode

It seems rather large to me, but you could just look at the results to see if they make any sense.

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6