Hello,
I have a large dataset (>10,000) of 16S rRNA sequences. Rather than build a phylogenetic tree, I'd rather visualize the analysis on a 2D PCoA-like plot.
I plan to use a maximum likelihood method for the analysis and ultimately want to portray the data in a PCoA-like plot. Tree topology is not important for this. Is it possible to run an ML-based analysis and obtain just the resulting distance matrix? I'd like to use the matrix to create the PCoA plot. Also, is the dataset too large? Any suggestions of softwares?
The goal of this analysis is to evaluate the relatedness of select strains (~300) against a larger global population.
Really appreciate the help!
Thanks, Peter
Can you edit your post and make it clear, please. Which data do you have and what do you want to highlight through the PCoA?
Ok, updated the post.
Or I don't understand what you're trying to do or I'm not familiar with the analysis that you want to do. Can you provide a reference paper that highlights a similar analysis to the one that you want to do, please?
From your description and based on my background on microbial ecology, this sounds like a beta-diversity analysis to me (sorry if I misunderstood). If so, you need to have at least 2 or more samples.
If the aim is to evaluate the relatedness of strains, I believe this can be done trough a phylogenetic tree. Of course that you probably need to collapse some branches to make it readable. Are the 10 K seqs non-redundant?
You can do a phylogenetic tree and PCoA analyses with QIIME2 software: https://qiime2.org/
QIIME2 has many plugins that use third-party software tools. For instance for phylogenetic tree has
fasttree
among others: https://docs.qiime2.org/2020.8/tutorials/phylogeny/It also allows to make ordination, such as PCoA, but you need to ensure that the data generated among different steps is compatible. QIIME2 has many tutorials, workshops, docs.
I hope this helps,
António
Thanks for the reply. I haven't been able to find a reference paper doing something similar but will keep looking.
The 10k sequences is after redundant sequence removal.
I have a global dataset of 10,000 sequences. Of the 10,000, there are 300 sequences (strains) that I am looking to highlight. So, I'm basically doing a standard phylogenetic analysis. However, it prefer to portray the data in a PCoA plot rather than a tree format.
From what I understand, QIIME2 can create PCoA plots but it's only when doing a beta analysis of different populations.
Does that make sense?
I don't think that you can do that, but I may have be wrong.
So, a PCA or PCoA is a multivariate method. So you need to have a set of observations across several (usually a few-to-thousand) variables. In your case I don't see which can be observations or variables. That's why a PCoA is used in beta-diversity because you've a set of samples/sites/communities (observations) across some OTUs/ASVs/16S seqs (variables). In this case you can make a ML-tree analysis, apply a beta-diversity phylogenetic distance, such as UniFrac, and display the distance matrix across an ordination method such as PCoA. But in your case if the 16S seqs are the variables which are the observations or vice-versa. I don't think that what you want to do is possible, but I may have be wrong.
António
Yeah, I'm a little concerned that I can't do that. One suggestion I heard to was to bin the sequences into OTUs and then run a beta-diversity analysis.
Is there no tool available to assess phylogenetic relationships in a 2D plot? I've read about multidimensional scaling, which might be what I'm looking for.