Hallo!
I am performing some research involving microRNA-mRNA integrated analysis and I would like to use the MMIA tool. To use this tool I have to prepare two properly formatted files: one with microRNA data, and second with mRNA data. On the MMIA website there is a very wide description of requirements for files, and some other available options. They say that my microRNA file should be tab - delimited, and looks like as follows:
#Class c1 c1 c2 c2
#NAME group1 group1 group2 group2
hsa-let-7a expression value ... ... ... ...
hsa-miR-21 ... ... ... ...
Also, they say, that the file containing microRNA data has to be prepared to have up-regulated microRNAs in group 2 in comparison with group 1.
And here is my question: how to and using which tool performe such a differential expression analysis to have ONLY upregulated microRNAs in group 2, not group 1 or mixed.
Such a file I am preparing in the following way:
I download my GSE files from GEO for microRNA and mRNA using R and GEOquery.
x<-getGEO('GSExxx)[[1]] write.table(data.frame(fData(x),exprs(x)), sep="\t", row.names=FALSE, file="xxx.txt"
It generates file containing all microRNA identifiers from matrix in a rows and all samples in a columns.
I process this file using Babelomics 4.3 tool to get rid of replicates, and missing values.
- Then, I manually change the names of samples and categorize it into two analyzed classes (f.e. c1 and c2).
- MISSING THING: categorizing my microRNA data to have only up-regulated in group 2.
So, what eventually am I doing wrong, or how to change it? I am a bit fresh in bioinformatics, especially in that kind of analysis, so please, help me.
Thanks in advance!
Have you done the DE analysis on the miRNAs yet? If not, you need to do that first.
Yes, I was. I used Babelomics's tool for class comparison and tried limma test.
At least with limma you'll be able to subset things be direction of change, so you should be able to use those results (I'm not familiar with Babelomic's tool, so I can't comment on it).
You need to choose the ones with the positive FC if C2 was your numerator (C2 vs C1), or negative FC if your C2 was your denominator (C1 vs C2). Of course, the FDR or padj should be something like less than 0.01 or 0.1.
Thanks for answer, but unfortunately I do not get it. I used limma test @ Babelomics, and I cannot precise what is my numerator/dominator or the order of analyzed groups ;/
Perhaps "pure" limma with R will be more suitable? In that case, I will need a help with codes ...
Hi, I would start over, yes. There are good vignettes to help with the analysis http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf. Normally package sort the names of the groups, so i would guess this is C2/C1. But you need to be sure about this. Weird that babelomics doesn't show u the whole report where the name of the coefficient should appear (this will tell you about the direction of the FC).
Hope this helps.
Following your advice, I will try with limma @ R.
Meanwhile, the file below is an output of limma test from Babelomics. What do you think about that?
https://plus.google.com/photos/101772480484282089828/albums/6072708919051958305
That they only give part of the information. If you do the analysis you can get the FC your self. Sometimes the t-statistics is related to the FC, like positive and negative values, but some times the t-statistics is always positive and tell you about how much far away from the expected value are the two groups expression. Si i would definitely do the DE again with limma.
Or maybe you can take the statistical significant miRNA, and do yourself the FC with the table you should have. average(C2) / average(C1). Then just take the positive values. I guess that this should match the mRNA somehow, so you should now if you want up-regulation in C2 or C1.
OK. Thank you for help. I will play with R then (but I hoped it would not be necessary ;))