Generating Both Pileup And Vcf/Bcf Using Samtools Mpileup
1
8
Entering edit mode
11.8 years ago

I'm using samtools mpileup and would like to generate both a pileup file and a vcf file as output. I can see how to generate one or the other, but not both (unless I run mpileup twice). I suspect I am missing something simple.

Specifically, calling mpileup with the -g or -u flag causes it to compute genotype likelihoods and output a bcf. Leaving these flags off just gives a pileup. Is there any way to get both, without redoing the work of producing the pileup file? Can I get samtools to generate the bcf _from_ the pileup file in some way? Generating the bcf from the bam file, when I already have the pileup, seems wasteful.

Thanks for any help!

samtools • 32k views
ADD COMMENT
10
Entering edit mode
11.5 years ago
Joseph Hughes ★ 3.0k

I think this is what you are asking for:

samtools mpileup -E -uf reference.fa file.bam > file.mpileup

In this first step, you will get the mpileup file, then:

bcftools view -cg file.mpileup > file.vcf

This will give you the vcf file. If you want bcf rather than vcf:

 bcftools view -bcg file.mpileup > file.bcf
ADD COMMENT
0
Entering edit mode

Although the file is named as file.mpileup, samtools mpileup -u actually generates a BCF file. You can generate a pileup file without using -g and -u, then try this tool pileup_to_vcf to convert pileup file to vcf file.

ADD REPLY
0
Entering edit mode

I have tried using both methods, I have generated a file.pileup which has the 6 columns and looks correct.

when I use the bcftools I get the warning that the column !=5 warning.

The pileup_to_vcf has also not worked and seems to just stall and not run (checking ps regularly)

I have also used the samtools sam_2_vcf.pl script to no avail and this gives me an error saying that there is insufficient columns in the .pileup file (the opposite of bcftools).

I am really stuck here as I need to get the files back to vcf.

Any help would be greatly appreciated!

ADD REPLY
0
Entering edit mode
  1. There is a test mpileup file in pileup_to_vcf/test-data folder. Run ./pileup_to_vcf.py -i test-data/test.pileup -o test.out.vcf and then you can get the vcf results (a file named test.out.vcf). It's the same when you use your own mpileup files.
  2. bcftools doesn't work on pileup format data. It works on bcf/vcf files.
  3. samtools provides a script called sam2vcf.pl, which works on the output of samtools pileup. However, this command is deserted in newer versions. The output of samtools mpileup does not satisfy the requirement of sam2vcf.pl. You can check the required pileup format on lines 95-99, which is different from output of samtools mpileup.
  4. If you want to get a vcf format, refer to examples on samtools reference pages.
ADD REPLY
1
Entering edit mode

Where can I download pileup_to_vcf.py? @Dejian

ADD REPLY
0
Entering edit mode

what is reference.fa file? where can I download it?

ADD REPLY
0
Entering edit mode

That is the reference genome that you aligned your data to.

ADD REPLY
0
Entering edit mode

where can i download it? I want to map with human genome hg19?

ADD REPLY
0
Entering edit mode

You can download genome reference and annotation files from GENCODE.

Illumina's iGenomes site also has ready to use sequence/index/annotation bundles for various genomes.

ADD REPLY

Login before adding your answer.

Traffic: 1694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6