ordination analysis & agglomeration
1
0
Entering edit mode
3.0 years ago
ymj ▴ 10

Hi

I am doing a microbiome ordination analysis and I am not entirely sure on the use of agglomeration of taxa for this ordination analysis. I have a dataset with amplicon sequence variants that are assigned up until genus level. I wondered whether it makes sense to use agglomerated taxa for this ordination analysis and, as well in further steps (normalization, diversity analysis)? Why (not)?

Also, more on this ordination analysis, I am following this figure from https://www.frontiersin.org/files/Articles/294209/fmicb-08-02224-HTML/image_m/fmicb-08-02224-g002.jpg of which I derive that, when using rarefied counts (which I do use in my analysis), I need to follow the approach and use the Bray-Curtis dissimilarity distances as well as PcoA. However, when doing so, I got an error:

ps_of_choice <- ps_rare 
method_of_choice <- "bray"
ord_of_choice <- "PcoA"

dist_matrix <- phyloseq::distance(ps_of_choice, method = method_of_choice) 
dist_matrix <- as.matrix(dist_matrix)
head(dist_matrix)[,1:6]

ord <- phyloseq::ordinate(ps_of_choice, ord_of_choice)

phyloseq::plot_scree(ord) + 
  geom_bar(stat="identity", fill = "blue") +
  labs(x = "\nAxis", y = "Proportion of Variance\n")

head(ord$CA$eig) 

sapply(ord$CA$eig[1:5], function(x) x / sum(ord$CA$eig))  

#Scale axes and plot ordination
clr1 <- ord$CA$eig[1] / sum(ord$CA$eig)
clr2 <- ord$CA$eig[2] / sum(ord$CA$eig)
phyloseq::plot_ordination(ps_of_choice, ord, type="samples", color="gender") + 
  geom_point(size = 2) +
  coord_fixed(clr2 / clr1) +
  stat_ellipse(aes(group = gender), linetype = 2)

Error in unit(abs(aspect_ratio), "null") : 'x' and 'units' must have length > 0

If I change the ord_of_choice (see line ord <- ...) to RDA, then the error is no longer there. Why is this the case? I have the code from an online tutorial, and I think the error must be in these "clr1" and "clr2" lines, but I don't understand what is exactly happening there.

Thank you in advance!

phyloseq ordination rarefying • 976 views
ADD COMMENT
0
Entering edit mode
3.0 years ago
Chris Dean ▴ 420

The aggregation (or agglomeration) of sequence features is a method for combining abundances within specific taxonomic ranks. The choice to aggregate sequence features to a specific taxonomic rank is completely up to you -- the investigator.

Each ASV is assigned a specific taxonomy. However, you may have noticed that not all ASVs will have a classification at each taxonomic rank. Indeed, a large proportion of ASVs will likely have been assigned a psuedo taxonomic rank of "NA" at the Genus level. Why do you think this might be?

If you aggregate sequence features to the Genus level, you are simply combining the abundances within each genera. In this way, you are reducing the size and complexity of your dataset, but losing information about the genetic diversity within that Genus. What do you think are the advantages and disadvantages of this approach?

It is completely reasonable to use an aggregated count matrix for normalization, diversity and ordination analysis, but be mindful about how you interpret the results because you will be analyzing your data on a different type of scale, which changes the biological interpretation.

I hope this helps, good luck!

ADD COMMENT
0
Entering edit mode

It looks like your question changed while I was writing a response. I will leave it here in case you find it helpful.

ADD REPLY
0
Entering edit mode

Yes, I have figured out already a bit on this agglomeration approach, but I'm still a bit stuck on this ordination analysis. I will edit my question to also this agglomeration question so that your answer replies on this part as well.

ADD REPLY

Login before adding your answer.

Traffic: 1561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6