Question

cell type-specific miRNA analysis and miRNA-mRNA expression correlation

0

Entering edit mode

7.8 years ago

lech.kaczmarczyk ▴ 50

Hi All,

I have corresponding microRNA and mRNA sequencing results from 3 different cell types. I would like to perform a general analysis of the data, with special focus of genes which are cell type-specific. Then, I would like to go a bit deeper, and compare individual replicates, and visualize miRNA-mRNA interaction profiles.

As expected, I see both, miRNA and mRNA cell type-specific expression patterns. The PCA analysis after crude data analysis with DESeq2 shows decent clustering, with few outliers. These outliers are nicely reflected by differences in specific genes when looking at the heatmaps.

For miRNA, I have about 1000 miRNA species with a reasonable number of counts. The counts do not include other small RNAs (these I deliberately excluded, they accounted only for 2-3% of all reads), only miRNAs.

So the first question is: are DESeq2 or Limma good options to handle such samples with 1) completely different gene expression profiles, and (as in case of miRNA) 2) relatively few genes and small library sizes. Maybe there is something more suited and appropriate?

The second question is more complex. I initially intended to start with miRNA, do some clustering and GO analysis, and then correlate the results with mRNA expression. Since each biological replicate included both, miRNA and mRNA, it seems wise to somehow pair those for the analysis. Of course I can compute L2FC and then do clustering/pathway enrichment, etc., but in so doing I would loose the "link". Are there any tools particularly suited for such analysis.

Also, regarding the outliers. Let's say one of my replicates is a bit funky, with some genes way different than the trend. To find such genes, I thought of looking at the variance of all replicates, and pick those with the highest. Then, I envisioned looking for how the outliers at the miRNA level are reflected in gene expression profiles of the corresponding mRNA sample. And vice versa. I still feel like if I was a cavemen amongst all the available software, so any tips on reasonable workflows are invaluable to me. I currently started to work with the multiMiR package and cluster profiler. So far, I find multiMiR excellent for miRNA annotation - so much better then ENSEMBL.

One thing I envision are the Venn diagrams showing cell type specificity of miRNA expression and maybe showing specificity of miRNA-mRNA interactions (Venn diagrams of interaction numbers and gene networks). The graphics here is not as much of an issue as preparing the data. Finally, a tool that allows for correlating clusters of (enriched/depleted) (miRNAs/mRNAs) in corresponding samples (and possibly replicates) would be very helpful.

Thanks in advance for all suggestions!

Cheers, Lech

RNA-Seq • 3.3k views

ADD COMMENT • link 7.8 years ago by lech.kaczmarczyk ▴ 50

score 0 · Answer 1 · 2017-09-24

I'll start the ball rolling on this... I'm not sure that there'll be any single answer, though.

I don't see any problem in using DESeq2 or Limma, provided that your starting point in each case is raw counts. It sounds like you did RNA-seq on 3 cell-types; however, if that is indeed what you did, then your counts matrix should include both micro-RNA and mRNA together, no? You can feasibly normalise these together in DESeq2 and then segregate them for the downstream analyses that you aim to do. Is this what you have already done?

For the main part of your work, assuming that you have normalised the data together, here are the things that I would love to try if I had your data:

I would get the significant micro-RNAs and look at their targets using something like miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/).
I would also correlate each microRNA to each mRNA using some parallelised function like corKB (see my thread HERE), thus generating a huge correlation matrix. Take a look also at the cor.test function in R, which assumes paired sanples: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.test.html
I would also build a network plot using the microRNA and mRNA combined, and identify community structures ('modules') in these. This may help to identify novel mRNA targets of microRNAs, and you could also see how the network modifies across the different cell-types. See the R packages igraph and plotrix. I can share some code with you to help you get started, if you want.

I would not even look to do GO enrichment, at least not at the beginning, as I wouldn't want to use it to bias hypothesis building throughout the analysis.

Regarding your outliers, I do not know how much 'outlying' they are. Leaving outliers in a dataset can, of course, skew results from your statistical tests because an outlier disrupts the distribution of your data points. You could possibly exclude them from your core analysis and then do an 'outlier-specific' analysis at the end.

score 0 · Answer 2 · 2017-09-24

0

Entering edit mode

7.8 years ago

lech.kaczmarczyk ▴ 50

Will write more, but now just a brief clarification.

I don't see any problem in using DESeq2 or Limma, provided that your starting point in each case is raw counts. It sounds like you did RNA-seq on 3 cell-types; however, if that is indeed what you did, then your counts matrix should include both micro-RNA and mRNA together, no? You can feasibly normalise these together in DESeq2 and then segregate them for the downstream analyses that you aim to do. Is this what you have already done?

No, I should have clarify this better. miRNA and mRNA are actually from different sequencing experiments. miRNAs were co-immunoprecipitated with RISC and mRNA was co-immunuprecipitated together with translating ribosomes (RiboTag). So samples prepared in different way, separate libraries, but same biological replicate. I did the normalization, initial assessment of cell-type markers enrichment, etc. This was all fine.

I would not even look to do GO enrichment, at least not at the beginning, as I wouldn't want to use it to bias hypothesis building throughout the analysis.

This is an option, although here we are actually focused on validating methodology (proof of principle that the method ensures cell type specificity of purified material). New discoveries, though possible, are not a priority. So essentially something to correlate gene clusters from miRNA and mRNA should not be avoided, even early. I am currently toying with this http://snf-515788.vm.okeanos.grnet.gr/ and a miRintegrator package. For the latter, until now I failed to figure out how to feed the data to it.

Thank a lot for the suggestions Kevin. Will go through the options... a lot of code <feeling terrified="">. The third options sound like we are on the same page. Would love to elaborate on that. So far, the tool I used failed to detect novel miRNAs, but seq depth was not too high either.

ADD COMMENT • link 7.8 years ago by lech.kaczmarczyk ▴ 50

0

Entering edit mode

Actually, comparing individual genes (miRNA and mRNA) between samples could be interesting, but seems challenging. An input in such case could be LFC and p-values I guess. Intuitive assumption would be that pairing mRNA and miRNA coming from the same biological replicate should be a way to go here. Maybe even doing so before before clustering. I would like to show interconnections between common, and cell type-specific pathways, something like this: envisioned way of data presentation We would like to show examples of pathways that are (most) specific for the cell types, and correlate different domains of gene expression (miRNA, mRNA, also ncRNA). What would be the way to approach such task using the correlation functions?

BTW: I'm rather a wet-lab guy (in case that's not obvious yet ;))

ADD REPLY • link 7.8 years ago by lech.kaczmarczyk ▴ 50

0

Entering edit mode

That plot is pretty intense - there's a lot of information there. If you are looking to generate something robust like that, then you may explore my colleague's CGBayesNets program, but I think that you have to implement it in MATLAB: http://www.cgbayesnets.com/

The code that I have been developing using graph theory is still just that, i.e., in development. It's one of the areas that I hope to develop further as a faculty appointment. I can still help you with basics though. The packages that I mentioned area good way to get started in this area though: igraph and plotrix.

Here is some quick code to get you off the ground, though:

g1 <- graph.adjacency(as.matrix(dist(MyDataMatrix, mode="undirected", weighted=TRUE, diag=TRUE)
g1 <- simplify(g1)
V(g1)$name <- V(g1)$name
V(g1)$color <- "royalblue"
V(g1)$shape <- "sphere"
V(g1)$vertex.frame.color <- "white"
E(g1)$color <- "grey"
E(g1)$arrow.size <- 1.0

mst1 <- as.undirected(minimum.spanning.tree(g1))
#or mst <- as.directed(minimum.spanning.tree(g1))

edgeweights1 <- edge.betweenness(mst1)/5000

#Connected graph
plot(g1, main="Network plot", vertex.size=3.0, vertex.label.dist=0, vertex.label.color="black", asp=FALSE, vertex.label.cex=0.25, edge.width=0.01, edge.arrow.mode=0)

#Minimal spanning tree
plot(mst1, main="MST", layout=layout.graphopt, vertex.size=3.0, vertex.label.dist=0, vertex.label.color="black", asp=FALSE, vertex.label.cex=0.3, edge.width=edgeweights1/2, edge.arrow.mode=0)

You can then detect communities and shade them:

#Detect communities
commGroup1 <- edge.betweenness.community(mst1, directed=FALSE)
commGroup1 <- fastgreedy.community(mst1)
clustering1 <- make_clusters(mst1, membership=commGroup1$membership)
V(mst1)$color <- commGroup1$membership + 1

plot(clustering1, mst1, main="Minimal spanning tree", layout=layout.graphopt, vertex.size=3.0, vertex.label.dist=0, vertex.label.color="black", asp=FALSE, vertex.label.cex=0.3, edge.width=edgeweights1/2, edge.arrow.mode=0)

At pretty much each command in the above code, there are multiple possibilities. You will have to devote a few days to get up to speed.

ADD REPLY • link 7.8 years ago by Kevin Blighe 89k

1

Entering edit mode

Thanks! Will get to it. Right now still struggling to find a way to analyze the data. Though it's slowly clarifying. One possibility might be to use tools for performing something similar to WGCNA. Got the idea from here: LINK In my understanding, that should help to delineate pathways, some of which will be cell-type specific. This approach would likely fish for specific gene hits, which are not detected by pure RNAseq statistics. The other (easier) way would be to pick a several genes from each cell type, with the highest expression differences compared to other conditions. These would be sort of marker genes, therefore likely involved in cell-type specific processes.

ADD REPLY • link 7.8 years ago by lech.kaczmarczyk ▴ 50