Reality check for consensus fasta generated with bcftools
0
2
Entering edit mode
3.0 years ago

Hello,

I have generated a consensus seuqwnce with bcftools with:

bcftools mpileup -Ou -f ref.fa input_AlnSrtDedSrt.bam | bcftools call -Ou -mv | bcftools norm -f ref.fa -Oz -o input.vcf.gz
tabix input.vcf.gz
bcftools consensus -f ref.fa input.vcf.gz > out.fa

The file out.fa is there, but it is 100% identical to the ref.fa, even having the same header (I assumed bcftools got the header from ref.fa).

Is this procedure OK or I simply dumped ref.fa into out.fa?

reference-genome bcftools assembly fasta • 2.7k views
ADD COMMENT
0
Entering edit mode

Can you show a few samples lines from the VCF? How are you sure that the files are identical?

ADD REPLY
0
Entering edit mode

it is 100% identical to the ref.fa

And you have confirmed that by doing an alignment?

ADD REPLY
0
Entering edit mode
$ head ref.fa
>MN855675.1 Siphoviridae sp. isolate 61, complete genome
GAAACAATGGATATTACTGCTACAGGGACCCAAGGACGGGTAAAGAGTTTGGATTAGGCAGAGACAGGCG
AATCGCAATCACTGAAGCTATACAGGCCAACATTGAGTTATTTTCAGGACACAAACACAAGCCTCTGACA
GCGAGAATCAACAGTGATAATTCCGTTACGTTACATTCATGGCTTGATCGCTACGAAAAAATCCTGGCCA
GCAGAGGAATCAAGCAGAAGACACTCATAAATTACATGAGCAAAATTAAAGCAATAAGGAGGGGTCTGCC
TGATGCTCCACTTGAAGACATCACCACAAAAGAAATTGCGGCAATGCTCAATGGATACATAGACGAGGGC
AAGGCGGCGTCAGCCAAGTTAATCAGATCAACACTGAGCGATGCATTCCGAGAGGCAATAGCTGAAGGCC
ATATAACAACAAACCATGTCGCTGCCACTCGCGCAGCAAAATCAGAGGTAAGGAGATCAAGACTTACGGC
TGACGAATACCTGAAAATTTATCAAGCAGCAGAATCATCACCATGTTGGCTCAGACTTGCAATGGAACTG
GCTGTTGTTACCGGGCAACGAGTTGGTGATTTATGCGAAATGAAGTGGTCTGATATCGTAGATGGATATC
$ head out.fa 
>MN855675.1 Siphoviridae sp. isolate 61, complete genome
GAAACAATGGATATTACTGCTACAGGGACCCAAGGACGGGTAAAGAGTTTGGATTAGGCA
GAGACAGGCGAATCGCAATCACTGAAGCTATACAGGCCAACATTGAGTTATTTTCAGGAC
ACAAACACAAGCCTCTGACAGCGAGAATCAACAGTGATAATTCCGTTACGTTACATTCAT
GGCTTGATCGCTACGAAAAAATCCTGGCCAGCAGAGGAATCAAGCAGAAGACACTCATAA
ATTACATGAGCAAAATTAAAGCAATAAGGAGGGGTCTGCCTGATGCTCCACTTGAAGACA
TCACCACAAAAGAAATTGCGGCAATGCTCAATGGATACATAGACGAGGGCAAGGCGGCGT
CAGCCAAGTTAATCAGATCAACACTGAGCGATGCATTCCGAGAGGCAATAGCTGAAGGCC
ATATAACAACAAACCATGTCGCTGCCACTCGCGCAGCAAAATCAGAGGTAAGGAGATCAA
GACTTACGGCTGACGAATACCTGAAAATTTATCAAGCAGCAGAATCATCACCATGTTGGC

The alignment gives 100% identity. The only thing that is different is the line span...

ADD REPLY
0
Entering edit mode

Do the contig names in the VCF match fasta headers in the ref file?

ADD REPLY
0
Entering edit mode

they are identical

ADD REPLY
0
Entering edit mode

Please show us a few sample lines from the VCF. zgrep -A5 "^#CHR" input.vcf.gz should suffice.

ADD REPLY
0
Entering edit mode

BTW, there seems to be an issue on the github repo already matching your case: https://github.com/samtools/bcftools/issues/934

Maybe try the tabix-related fix?

ADD REPLY
0
Entering edit mode

Following, as I'm having the same issue of bcftools consensus generating an output file identical to my input reference sequence, without having applied any variants from the .vcf.gz generated by bcftools mpileup. I concatenated the output consenseus sequence with the reference into a multifasta and aligned to confirm they are exactly the same, including the headers. Confirmed that the .vcf.gz file contains expected contents.

For reference, here's my code:

bcftools mpileup --max-depth 500 -f -A $referenceSeq $sample.plastome.rmdup.bam | bcftools call -mv -Oz -o $sample.calls.vcf.gz
bcftools index $sample.calls.vcf.gz
bcftools consensus -f $referenceSeq $sample.calls.vcf.gz -o $sample.plastome.fasta
ADD REPLY
0
Entering edit mode

I think opening an issue on the github repo might be the way to go - given you and marongiu.luigi both seems to be running into problems with consensus.

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6