Question

Confusion regarding generating phylogenetic tree from snp data

0

Entering edit mode

7.7 years ago

ehzed ▴ 40

Hi,

I am currently comparing multiple breeds of one domestic animal and as part of my analysis, I plan on generating a phylogenetic tree. I started with whole genome sequencing data and since then I have called the snps.

Methods I have tried so far (you can skip ahead if you want!):

Tried to generate trees using SNPhylo (uninformative trees, every breed was equally distant from each other)
Concatenating multiple genes together to build a tree as suggested here. This made a slightly more informative tree, but because my datasets were from different sources (some from Illumina and some from SOLiD sequencing), the uneven read coverage meant that some breeds had far more snps in certain areas.

My problem:

Now I am turning to generating trees from snps. I have extracted regions (bed file) that have good coverage depth from all breeds and used that to subset my vcf files. But I am confused as to what to do next, I have read about:

Generating a snp matrix
Concatenating snps into a "fake" sequence
Introduce snps in the vcf file into the reference genome (same as what I did when I built a tree from concatenated gene sequences)
Use the SNPRelate R package (I can't find info regarding what method this package uses to make a tree..is it maximum likelihood, neighbour-joining...?).

I am leaning towards the 2nd or 3rd method. I don't know how to do 2 (this script doesn't seem to take into account the position of the snp in the genome, so it wouldn't make sense to do this to vcf files that contain many different sites, where some sites are common to all and others are unique). I already know how to do 3, but is that recommended for random, very small sections of the genome where you don't if that section is very conserved/divergent, or covers a gene? Also, what is the most recommended/established method? Thanks!

snps phylogenetic tree • 3.0k views

ADD COMMENT • link updated 7.7 years ago by Petr Ponomarenko ★ 2.8k • written 7.7 years ago by ehzed ▴ 40

0

Entering edit mode

Hi, I am trying to do a similar thing. Have you found a solution yet?

ADD REPLY • link 6.3 years ago by JJ ▴ 710

score 0 · Answer 1 · 2017-03-03

There is another option for you if you want to compare populations (breeds of many sample data) with other populations, you can try using ADMIXTURE https://www.genetics.ucla.edu/software/admixture/ to move to a K-dimensional vectors where K is relatively small. This will help you catch main aggregated information about each population for comparison.