miRNA DE analysis to my opinion has some significant differences compared to gene DE.
There are numerous issues with almost all parts of the analysis: counting miRNA reads (some algorithms might be too strict and you end up losing power or others can be too loose and you can have increased variance), normalization (miRNA normalization is a debate topic, especially after a couple of RNA articles) and as you have mentioned the DE algorithm.
May I ask how you've ended up with miRNA counts? Did you use an inhouse algorithm or a ready-made tools, such as miRDeep?
The available DE algorithms have not been extensively tested in miRNA-Seq (or small-RNA-Seq) data for me to suggest one that easily even though we use them quite a lot for that specific task.
We've used DESeq many times before for miRNA DE analysis, even though it has proved to be a bit strict. I still always start from that. It helps sometimes to remove really lowly expressed miRNAs, since a fraction of miRNAs hog most of the sequenced reads and the rest get very few counts that will disable them to achieve statistical significance. By removing the miRNAs with really low expression you might increase power for the rest.
I've used DESeq2 from the first days it got out in Bioconductor but I'm still reluctant to put it in my "official" pipeline. I really liked the improvements but I'm waiting for the article to come out or to see some more tests. The only in-depth comparison I can recall at the moment is from (http://bib.oxfordjournals.org/content/early/2013/12/02/bib.bbt086.full) [check the supplement], where I was not convinced that it's that better than the original DESeq. But it's only one publication..
I would suggest also to check out EdgeR.
Cheers,
IV
Edit
If you do PCA you will get a 2D representation of which samples are closer to each other (more similar) than others. It can be useful in numerous situations where you can see if your experiment worked for all replicates, exclude outliers, identify subsets within your samples, etc.
Heatmap of DE miRNAs is also useful, since you can get a nice image of groups of miRNAs that behaved similarly and is usually used for presentations or articles. From the miRNAclusters you can also get some interesting gene subgroups for post analysis.
After the DE analysis you can use a tool for pathway analysis, such as miRPath (http://www.microrna.gr/miRPathv2) that can get you pathways targeted by your DE miRNAs.
End of Edit
do you have any replicate for your analysis?
yes for sure...two conditions with 5 replicates each.
I wouldn't use unadjusted p-values unless really desperate (they're less likely to validate). If you haven't done so already, try doing some independent filtering to increase statistical power.
Thanks for your suggestion but could you please explore your advice. As i am not so expert of R
I assume you're referring to the independent filtering part. Have a read through this paper from Wolfgang Huber's group and the genefilter Bioconductor package. Also note, this is all done automagically in more recent version of DESeq2, so if you're using that then this may have already been done.
ahhh, thanks for your nice suggestions, now i am going further the section of DESeq manual of 5.1 filtering by overall counts upto Heat Map and PCA plot (for data quality assessment bysample clustering), is this same as in DESeq ????
They're more or less the same, it's just done automatically in the more recent versions of DESeq2. If you're still using DESeq, do yourself a favor and transition over to DESeq2, it have a number of other improvements.
Dear sir dpryan79 , Thanks for your kind advice, its really nice use DESeq2 instead DESeq, i used and got results that, would like to discuss. in my code i used,
###DE ananlysis
dds <- DESeq(ddsHTSeq)
res <- results(dds)
res <- res[order(res$padj),]
head(res)
####extraction of DE at value ####counts
sum(na.omit(res$padj < 0.1))
[1] 42 ##output
sum(na.omit(res$pval < 0.05))
[1] 120 ##output
###Investigate strongly donwregulated genes
head(res[order(res$log2FoldChange, -res$baseMean),])
down <- (res[order (res$log2FoldChange, -res$baseMean),])
write.csv(down,file="Human_neuro_down_padjless_0.1.csv")
###Investigate strongly upregulated gene
head(res[order (-res$log2FoldChange, -res$baseMean), ] )
upregulated <- (res[order (-res$log2FoldChange, -res$baseMean),])
write.csv(upregulated,file="Human_neuro_upregulated_padjless_0.1.csv")
1) is the correct way, what i have done ??? Also in order to find Down and Up-regulated genes ?? as prev. mentioned in DESeq ?? 2) May i say now after independent filtering, i got 42 gene that are DE at padj < 0.1 ?? or 120 genes are DE at pval < 0.05 ?? pls make me sure, i am bit confused. 3) i have gone through overall counts data transform, plotting Heat map and PCA plots AND Mean counts & Hist of p-value. And i stopped here. But in the last part while running ..
abline(a=0, slope, col="red3",lwd=2) #does not work##error#object 'slope' not found why ???
Thanks and looking forward for your kind advice again
That looks correct from eye-balling the code at least. You have 42 DE genes, never use unadjusted p-values with data like this unless you have a very good reason. The
abline()
just plots a straight line on a graph, presumbably with other points. I assume there was a step skipped or a typo, at least if you're just following the vignette. Regardless, that's not something to worry about.Thank you sir for your quick reply, and what about down about up regulate genes ??? May i have to look over csv file ???
It's easiest to just look at the sign of the log2 foldchange in the CSV files. You're ordering things by that, but it's usually (in my experience) easiest to just order things by adjusted p-value and then not separate things by direction of change (after all, it's often beneficial to consider both down and up-regulated genes that might be interacting together).