Question

How To Code Multiple States For Discrete Data Type

1

Entering edit mode

12.0 years ago

qiyunzhu ▴ 430

Dear all,

I'm trying to do phylogenetic reconstruction using discrete character data, such as geographical distribution. For example:

species A            Asia
species B            Europe
species C            Africa
species D            Europe

This works fine for me, however, my situation is that some species are distributed in more than one continents, for example

species E            Asia,Europe

My question is how I can code multiple character states into one data point for popular phylogenetics softwares, including BEAST, MrBayes, Mesquite, etc. My favorite is BEAST. I tried the way above in BEAST, but it didn't work. In the xml file, "Asia,Europe" is treated as one character state, instead of "Asia" and "Europe", which I desired. So I'm posting to request if anyone can give me a solution, or tell me it's just not possible.

Thanks!

phylogenetics • 4.9k views

ADD COMMENT • link 12.0 years ago by qiyunzhu ▴ 430

0

Entering edit mode

I don't have any clue about the softwares you mentioned, but can't you split the multi-continent species to different lines like

species E Asia

species E Europe

Will this work for you?

ADD REPLY • link 12.0 years ago by Sukhi Singh 11k

0

Entering edit mode

Yeah, it looks like species to continent is a many-to-many relationship.

ADD REPLY • link 12.0 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

That sounds a nice idea. I just tried. In BEAST, I wrote the lines in XML as two lines: <attr name="location">Fujian</attr> <attr name="location">Guangdong</attr> I'm waiting to see if BEAST really treats it as two states.

ADD REPLY • link 12.0 years ago by qiyunzhu ▴ 430

0

Entering edit mode

I found that the lower line overrides the upper line. So it didn't work for BEAST.

ADD REPLY • link 12.0 years ago by qiyunzhu ▴ 430

0

Entering edit mode

Oh, may be try it as two different location id's but with same names. I was just looking here or just contact the developers.

ADD REPLY • link 12.0 years ago by Sukhi Singh 11k

score 3 · Answer 1 · 2012-11-28

I consulted the BEAST authors, who kindly gave me the official solutions. Here are how it should be done:

Edit the xml file. In state code section, create ambiguity definitions like:

<generalDataType id="fruit.dataType">
        <state code="Asia"/>
        <state code="Europe"/>
            ...
        <state code="Antarctica"/>
        <ambiguity code="Eurasia" states="Asia Europe"/>
</generalDataType>

Then go back to taxa section, set the code of desired taxa as "Eurasia".

Then go to treeLikelihood section, set useAmbiguities="true".

Hope this is helpful to people who read this post.

score 1 · Answer 2 · 2012-11-25

1

Entering edit mode

12.0 years ago

Josh Herr 5.8k

I haven't tried this in BEAST yet, but my analysis works fine in PAUP and MrBayes for multiple character states with parentheses: in your data matrix, for example, you will have (Asia,Europe) for character uncertainty or {Asia,Europe} for both character states. Give this a try.

I do know this designation will not work using the read.nexus.data script used by ape in R. In that case, I've either added a separate character to my data matrix or coded my data as continuous, which both have data analysis disadvantages downstream.

ADD COMMENT • link 12.0 years ago by Josh Herr 5.8k

0

Entering edit mode

Thanks for the information in MrBayes and PAUP! I tried it in BEAST, but it didn't work. :( At present I haven't tried MrBayes because I feel that MrBayes runs much slower than BEAST and my data set is just HUGE. But if MrBayes can handle this I will give it a try.

ADD REPLY • link 12.0 years ago by qiyunzhu ▴ 430

0

Entering edit mode

I think this might be a XML vs. NEXUS problem. Sounds like David's idea to code the location independently is a way around this that is file neutral.

ADD REPLY • link 12.0 years ago by Josh Herr 5.8k

0

Entering edit mode

Yes I agree, and I'm coding with multiple characters now for a try. It seems that nexus format is sometimes more flexible than xml, and, more readable.

ADD REPLY • link 12.0 years ago by qiyunzhu ▴ 430

score 1 · Answer 3 · 2012-11-25

I don't know of any software that deals with a taxon taking multiple states for the same character, but do you really need to code it this way? Why not make "geography" a presence-absence character:

       Asia    Africa     Europe
sp1      1        0         0
sp2      1        1         0
sp3      0        0         0

edit the other option to consider is dreaming up some discrete-character model (eg rev) in which each possible geographic combination is represented by some rate of change that reflects the fact that changing from, say, Asia-Africa -> Europe-Africa-Asia is much more likely than going Europe -> Europe-Africa-Asia... though this seems like a lot of work.