Entering edit mode
5.8 years ago
kalfsnes
▴
10
So I have tree file and metadata. Metadata includes sequence type ST (clusters that correspond in most part to the phylogeny), but also some patient data, age, sex etc.
I am trying to identify which ST has the highest percent of males etc. Would like to use information in phylo to strengthen analysis. And/or simply look at for clustering of e.g. high percent males in phylogeny irrespectively of ST.
Current idea is to find something in phytools (R) or similar, but would appreciate if the community here have any other suggestions. Phylogenetic independent contrasts?
Thanks in advance
Hello, could you please give more biological context? What are representing the tip of your phylogeny ? A sequence from an individual? Having some pathology? Why do you want to know which ST has the highest percent of males? Is it to identify sampling biases? Did you do an exhaustive sampling? Depending on your question, I am not sure phylogenetic independent contrasts would be adequate.
Hi, yes, so the phylogeny represents all samples from a region in a specific time frame. The tips represent bacterial samples (one species) from all the patients in this time frame. I would like to stratify in order to describe the different clusters, which cluster (and consequently ST) are primarily found in males, which are found in females? Is there a clustering of lower age groups in some clusters compared to others? And similar questions testing the various meta-data variables against the phylogeny/ST clustering. Did I make it more clear?
Yes thanks! I am not an expert, but here some of your data is categorical (ST, sex). For those, if you want to detect a correlation (for example ST versus sex), you should rather use
fitPagel
from the packagephytools
of Liam Revell. There is also the packagecorHMM
from J. Beaulieu for more elaborate models of transition. If you are interested in which particular ST correlates with sex, you can binarize ST: for example you would define a variable ST1 which is 1 when ST=="ST1" and 0 otherwise. In addition, to correlate a discrete character with age, you can use the liability model implemented inphytools::threshBayes
. One concern I have though is with multiple testing; you might want to find a multivariate procedure rather than many bivariate tests (however I don't think they exist for phylogenetic and discrete data yet). I hope this gives you some place to start.