Stratify phylogeny based on variables - e.g. which ST are represented by high % males
0
0
Entering edit mode
5.8 years ago
kalfsnes ▴ 10

So I have tree file and metadata. Metadata includes sequence type ST (clusters that correspond in most part to the phylogeny), but also some patient data, age, sex etc.

I am trying to identify which ST has the highest percent of males etc. Would like to use information in phylo to strengthen analysis. And/or simply look at for clustering of e.g. high percent males in phylogeny irrespectively of ST.

Current idea is to find something in phytools (R) or similar, but would appreciate if the community here have any other suggestions. Phylogenetic independent contrasts?

Thanks in advance

phylogeny statistics • 1.1k views
ADD COMMENT
1
Entering edit mode

Hello, could you please give more biological context? What are representing the tip of your phylogeny ? A sequence from an individual? Having some pathology? Why do you want to know which ST has the highest percent of males? Is it to identify sampling biases? Did you do an exhaustive sampling? Depending on your question, I am not sure phylogenetic independent contrasts would be adequate.

ADD REPLY
0
Entering edit mode

Hi, yes, so the phylogeny represents all samples from a region in a specific time frame. The tips represent bacterial samples (one species) from all the patients in this time frame. I would like to stratify in order to describe the different clusters, which cluster (and consequently ST) are primarily found in males, which are found in females? Is there a clustering of lower age groups in some clusters compared to others? And similar questions testing the various meta-data variables against the phylogeny/ST clustering. Did I make it more clear?

ADD REPLY
0
Entering edit mode

Yes thanks! I am not an expert, but here some of your data is categorical (ST, sex). For those, if you want to detect a correlation (for example ST versus sex), you should rather use fitPagel from the package phytools of Liam Revell. There is also the package corHMM from J. Beaulieu for more elaborate models of transition. If you are interested in which particular ST correlates with sex, you can binarize ST: for example you would define a variable ST1 which is 1 when ST=="ST1" and 0 otherwise. In addition, to correlate a discrete character with age, you can use the liability model implemented in phytools::threshBayes. One concern I have though is with multiple testing; you might want to find a multivariate procedure rather than many bivariate tests (however I don't think they exist for phylogenetic and discrete data yet). I hope this gives you some place to start.

ADD REPLY

Login before adding your answer.

Traffic: 2117 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6