From multiple VCF files to multiple sequence alignment?

0

Entering edit mode

8.3 years ago

Peter vH ▴ 130

Hi there

I have multiple VCF files generated from variant calling on sequenced bacteria (M. tuberculosis). I would like to create a multiple sequence alignment file (as a step towards computing a phylogeny of the samples) by combining the reference genome with the VCFs. Before I put time and effort into creating a script to do this, is there an existing solution? I see that workflows such as SNPhylo compute an alignment with MUSCLE before doing tree construction - I'm trying to avoid that step.

Thanks, Peter

alignment VCF bacterial • 5.3k views

ADD COMMENT • link updated 8.2 years ago by Biostar 20 • written 8.3 years ago by Peter vH ▴ 130

0

Entering edit mode

Please check this post. The comment by natasha provides a good solution

ADD REPLY • link 8.3 years ago by microfuge ★ 2.0k

0

Entering edit mode

I'm not quite sure how? The tools suggested in those threads, vcf-consensus and FastaAlternateReferenceMaker in the other, produce a single FASTA output from a single VCF input and don't deal with gaps created when considering the alignment between sequences having insertions and deletions.

ADD REPLY • link 8.3 years ago by Peter vH ▴ 130

0

Entering edit mode

I have not used FastaAlternateReferenceMaker but iterated vcf-consensus -s <sample_name> to generate fasta file for each sample and then do the alignment. The new version also used IUPAC codes so that heterozygous genotypes can be encoded. Gaps are usually ignored in alignment so should not matter but I explicitly don't know how indels and rearrangements are handled by vcf-consensus.

ADD REPLY • link 8.3 years ago by microfuge ★ 2.0k

0

Entering edit mode

So you'd do iterative vcf-consensus followed by MUSCLE? SNPhylo seems to do something like that. I'll experiment and compare it with the script I've written.

ADD REPLY • link 8.3 years ago by Peter vH ▴ 130

0

Entering edit mode

Yes. But it was a chloroplast genome and results were good. The advantage being no heterozygous as heteroplasmy was not detected.

ADD REPLY • link 8.3 years ago by microfuge ★ 2.0k

Login before adding your answer.