Question

Best Way To Build A Tree With Numeric Data (Range 1-25) Phylip Pars Or Others

1

Entering edit mode

14.0 years ago

Pavid ▴ 160

Hey!

I'm trying to build a tree from this alignment. I've converted the numbers with more than one algorithm into letters: a for 10, b for 11 and p for 25.

I have tried to use Phylip pars to build a tree, although I don't really understood how it works. So I want to understand the program to provide the correct input or if someone has other solutions, please be free to help me :)

Cheers

phylogenetics • 3.0k views

ADD COMMENT • link updated 14.0 years ago by David W 4.9k • written 14.0 years ago by Pavid ▴ 160

1

Entering edit mode

What do the numbers mean? Are they just 25 different categories, or is the number 3 more similar to 4 than it is to 15? In the latter case, turning the numbers into an alphabet like you have done would not make sense.

ADD REPLY • link 14.0 years ago by Lars Juhl Jensen 11k

0

Entering edit mode

the numbers represent MIRU (Mycobacterial Interspersed Repetitive Units), number of repeats. What would the best option then?

ADD REPLY • link 14.0 years ago by Pavid ▴ 160

score 1 · Answer 1 · 2011-08-13

Hi Pavid,

If i understand what you are trying to do. Then a distance method is probably best (pars is for "unorded multi-state data" so, for instance, if you had taxa with 1, 3, and 12 repeats at one locus the it would treat all comparisons as one step away from each other)

I'm pretty sure Arlequin has a method for this, but it's pretty easy in R if you are comfortable with it. The only thing you'll need to think about is the best distance measure

  data <- read.table('repeats.tsv', header=T) # a subset of your data
  head(data ,3)
     taxon Ll L2 L3 L4 L5
  1  1574  2  2  3  2  2
  2  1585  4  2  2  2  3
  3  1588  6  2  2  1  2
  data.dist <- dist(data[2:6], method='manhattan')
  data.dist
    1 2 3 4 5
  2 4        
  3 6 4      
  4 1 3 5    
  5 3 3 5 2  
  6 1 5 7 2 2
  library(ape) #a phylogenetics package install.packages() to get it
  plot(nj(data.dist))

That uses manhattan distance to compare taxa, which is just a fancy way of saying counting the total number of differences between each taxon. I don't know about your markers, but that might not really reflect how they evolve - maybe they can double in a generation so the distance between 3 and 6 shouldn't be three times the distance from 3 to 4. You should probably do a little research about the best way to compare your markers.