Hi All,
I have corresponding microRNA and mRNA sequencing results from 3 different cell types. I would like to perform a general analysis of the data, with special focus of genes which are cell type-specific. Then, I would like to go a bit deeper, and compare individual replicates, and visualize miRNA-mRNA interaction profiles.
As expected, I see both, miRNA and mRNA cell type-specific expression patterns. The PCA analysis after crude data analysis with DESeq2 shows decent clustering, with few outliers. These outliers are nicely reflected by differences in specific genes when looking at the heatmaps.
For miRNA, I have about 1000 miRNA species with a reasonable number of counts. The counts do not include other small RNAs (these I deliberately excluded, they accounted only for 2-3% of all reads), only miRNAs.
So the first question is: are DESeq2 or Limma good options to handle such samples with 1) completely different gene expression profiles, and (as in case of miRNA) 2) relatively few genes and small library sizes. Maybe there is something more suited and appropriate?
The second question is more complex. I initially intended to start with miRNA, do some clustering and GO analysis, and then correlate the results with mRNA expression. Since each biological replicate included both, miRNA and mRNA, it seems wise to somehow pair those for the analysis. Of course I can compute L2FC and then do clustering/pathway enrichment, etc., but in so doing I would loose the "link". Are there any tools particularly suited for such analysis.
Also, regarding the outliers. Let's say one of my replicates is a bit funky, with some genes way different than the trend. To find such genes, I thought of looking at the variance of all replicates, and pick those with the highest. Then, I envisioned looking for how the outliers at the miRNA level are reflected in gene expression profiles of the corresponding mRNA sample. And vice versa. I still feel like if I was a cavemen amongst all the available software, so any tips on reasonable workflows are invaluable to me. I currently started to work with the multiMiR package and cluster profiler. So far, I find multiMiR excellent for miRNA annotation - so much better then ENSEMBL.
One thing I envision are the Venn diagrams showing cell type specificity of miRNA expression and maybe showing specificity of miRNA-mRNA interactions (Venn diagrams of interaction numbers and gene networks). The graphics here is not as much of an issue as preparing the data. Finally, a tool that allows for correlating clusters of (enriched/depleted) (miRNAs/mRNAs) in corresponding samples (and possibly replicates) would be very helpful.
Thanks in advance for all suggestions!
Cheers, Lech
Actually, comparing individual genes (miRNA and mRNA) between samples could be interesting, but seems challenging. An input in such case could be LFC and p-values I guess. Intuitive assumption would be that pairing mRNA and miRNA coming from the same biological replicate should be a way to go here. Maybe even doing so before before clustering. I would like to show interconnections between common, and cell type-specific pathways, something like this: We would like to show examples of pathways that are (most) specific for the cell types, and correlate different domains of gene expression (miRNA, mRNA, also ncRNA). What would be the way to approach such task using the correlation functions?
BTW: I'm rather a wet-lab guy (in case that's not obvious yet ;))
That plot is pretty intense - there's a lot of information there. If you are looking to generate something robust like that, then you may explore my colleague's CGBayesNets program, but I think that you have to implement it in MATLAB: http://www.cgbayesnets.com/
The code that I have been developing using graph theory is still just that, i.e., in development. It's one of the areas that I hope to develop further as a faculty appointment. I can still help you with basics though. The packages that I mentioned area good way to get started in this area though:
igraph
andplotrix
.Here is some quick code to get you off the ground, though:
You can then detect communities and shade them:
At pretty much each command in the above code, there are multiple possibilities. You will have to devote a few days to get up to speed.
Thanks! Will get to it. Right now still struggling to find a way to analyze the data. Though it's slowly clarifying. One possibility might be to use tools for performing something similar to WGCNA. Got the idea from here: LINK In my understanding, that should help to delineate pathways, some of which will be cell-type specific. This approach would likely fish for specific gene hits, which are not detected by pure RNAseq statistics. The other (easier) way would be to pick a several genes from each cell type, with the highest expression differences compared to other conditions. These would be sort of marker genes, therefore likely involved in cell-type specific processes.