Hi all,
I have whole-genome sequencing data for 100 individuals of my study species and would like to construct a maximum likelihood tree of them. So far I've called SNPs using bcftools and have all samples in a single multi-fasta VCF file. Are you aware of any software that can take a multi-fasta VCF file as input and use it to build a maximum likelihood tree?
Previously, I have used single-sample VCF files to generate separate fasta files for each samples using vcf-consensus, and then aligned these with Mafft and made trees from alignments with RaxML. The problem with this is that it loses information about heterozygosity - vcf-consensus simply always uses the ALT allele and even if it did use IUPAC ambiguity codes for heterozygous sites, I don't think that RaxML can handle these.
For reference, there's about 170,000 SNP sites in a genome of about 41Mb. Within each sample generally about a third of sites are heterozygous.
Sorry in advance for any gaps in understanding revealed by this question, and sorry if this has been answered before (I did find a few similar questions but sadly these didn't have answers).
Thanks in advance!