Best Way To Build A Tree With Numeric Data (Range 1-25) Phylip Pars Or Others
1
1
Entering edit mode
13.3 years ago
Pavid ▴ 160

Hey!

I'm trying to build a tree from this alignment. I've converted the numbers with more than one algorithm into letters: a for 10, b for 11 and p for 25.

I have tried to use Phylip pars to build a tree, although I don't really understood how it works. So I want to understand the program to provide the correct input or if someone has other solutions, please be free to help me :)

Cheers

phylogenetics • 2.7k views
ADD COMMENT
1
Entering edit mode

What do the numbers mean? Are they just 25 different categories, or is the number 3 more similar to 4 than it is to 15? In the latter case, turning the numbers into an alphabet like you have done would not make sense.

ADD REPLY
0
Entering edit mode

the numbers represent MIRU (Mycobacterial Interspersed Repetitive Units), number of repeats. What would the best option then?

ADD REPLY
1
Entering edit mode
13.3 years ago
David W 4.9k

Hi Pavid,

If i understand what you are trying to do. Then a distance method is probably best (pars is for "unorded multi-state data" so, for instance, if you had taxa with 1, 3, and 12 repeats at one locus the it would treat all comparisons as one step away from each other)

I'm pretty sure Arlequin has a method for this, but it's pretty easy in R if you are comfortable with it. The only thing you'll need to think about is the best distance measure

  data <- read.table('repeats.tsv', header=T) # a subset of your data
  head(data ,3)
     taxon Ll L2 L3 L4 L5
  1  1574  2  2  3  2  2
  2  1585  4  2  2  2  3
  3  1588  6  2  2  1  2
  data.dist <- dist(data[2:6], method='manhattan')
  data.dist
    1 2 3 4 5
  2 4        
  3 6 4      
  4 1 3 5    
  5 3 3 5 2  
  6 1 5 7 2 2
  library(ape) #a phylogenetics package install.packages() to get it
  plot(nj(data.dist))

That uses manhattan distance to compare taxa, which is just a fancy way of saying counting the total number of differences between each taxon. I don't know about your markers, but that might not really reflect how they evolve - maybe they can double in a generation so the distance between 3 and 6 shouldn't be three times the distance from 3 to 4. You should probably do a little research about the best way to compare your markers.

ADD COMMENT
0
Entering edit mode

Thank you david w for your response. Actually I've never used R, I'm more familiarized with python. But I can try it

ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6