Hi all,
I have a set of ~15x coverage whole genome resequencing data for >200 accessions of a crop species. I have produced a SNP-based phylogeny of these accessions via the following operations:
- bowtie2 mapping against reference genome
- bcftools mpileup
- bcftools call
- vcftools to retrieve SNP-wise and accession-wise statistics, analysed in R
- filtering using vcftools
- vcf2phylip.py
- IQtree to produce final phylogeny
My question is, given that I know the reference genome's state for every included SNP, it seems there is enough information to treat it as an accession in its own right. So is there a way to include the reference genome in the final tree? Seems like IQtree should have an option for this? Or maybe even bcftools or vcftools?
Many thanks,
Max
Hi Michael,
Yes exactly! I wondered if perhaps IQ-tree could do this 'under-the-hood' but if not then adding a dummy sample with all loci set to 0/0 is the next best thing - I'll give it go. Do you think I should set the read depths and qualities to arbitrary numbers that pass my vcftools filters? Otherwise the whole dummy individual would get filtered out.
Yep, I have +ASC enabled already, thanks you for the heads up.
Greetings to you too - in Norway I see?
Good point! Yes I think so. Once you have a sequence alignment, the VCF statistics do not matter any more.
Yes, Greetings from Bergen, Norway. Just remembered visiting JIC and Norwich a while back (that was about 15 years at least I think).
Cheers Michael
Excellent, I shall try and figure out a little script that appends a dummy sample to each line (i.e. each SNP).
Ah lovely! I moved here ~3 years ago for my PhD and am really enjoying it, it's a great institute to work at.
Cheers, Max
I guess another alternative is to remap raw reads from the reference genome against itself... i.e. treat it like any other sample throughout the whole pipeline.