Hello,
Is it possible to do differential gene „abundance“ (I don’t want to say expression because it was not sequenced) analysis from two conditions of 16s RNA seq data (Microbiome analysis)? So basically, by 16s seq we can assign sequences to bacterial taxonomies and then calculate the relative bacterial composition in a given tissue. Now my question is, if we know the genome of the species and if the genes are annotated, then wouldn’t it be possible to derive from that data that say in condition 2 the pathways for post-translational acetylation are increased (at least in relative terms)? Because the pathways consist of genes, not every bacterial species has these genes, thus we could do a GSEA or GO. I hope that wasn’t too confusing. Is there such an analysis? Thank you in advance.
It is technically possible, tools like PICRUSt2 were designed for this kind of analysis. However, microbial ecologist tends to eat you alive if you drive any conclusion based on these tools.
These kinds of approaches have huge limitations regarding how well-represented your taxa are in the reference tree used to infer the functional profile of the microbial community. For example, the 16S in sample A are more represented in the reference tree compared to the 16S in Sample B. Under this scenario, you will find more functions in SampleA than in SampleB and I would avoid any kind of conclusion based on the results of a differential/enrichment analysis.
Thank you for your comment. I will look into it.
Different species have different 16S rDNA copy numbers. How do you plan to take this into account while estimating the relative composition?
Thank you for your answer, but unfortunately I don't understand your comment. I think your answers relates to the calculation of the relative composition of bacterial communities by 16s RNA. But actually that is common practice. My question is rather can we use that information and - given that we have annotated their genomes - calculate that we have relatively less of Gene A, because all species that possess Gene A are relatively decreased, as an example.
Those studies show the relative composition of 16S RNA genes in the sample but extrapolating from that info to the actual community composition would require knowledge of the number of 16S rDNA copies in the given genomes. A bigger leap of faith still is assuming that some genome definitely has some gene because a reference genome which has relatively similar 16S rDNA has that gene. I know you can do such stuff with e.g. picrust but IMO such results are pretty much meaningless..
Ok, thanks for clarifying