Hello everyone, I have a dataset of differentially expressed miRNAs.
I aligned my microRNA reads (10 millions per sample aprox) against reference genome, obtaining an average of 7.5 million aligned read per sample, and used FeatureCounts from RSubRead to count them asigning to reference miRNA GTF in miRBase, obtaining an average of 60% of asignment. Then, I used DESeq2 to perform differential expression analysis.
As said in Mullokandov et al. (2013), functional miRNAs should only be considered when expressed above 100-1000 RPM on average. I calculated CPM from FeatureCounts counts file and obtained average expression per group (2 groups contast) and expression per sample.
I would like to know if there is any difference between the concept of RPM and CPM, as I think the are quite the same, but not sure about that. I assumed miRNAs with a cpm expression below 100 to be spurious and noise, but I would like to be sure about that.
Any suggestions and clarifications?
Thanks
RPM=CPM=FPM (single end reads).
CPM cutoff of 100 sounds like a lot. But I have limited experience in miRNA sequence analysis. Here is one post for CPM cutoff for miRNA data.
I understand what that post says, but this is not refereing to biologically active miRNAs but to initially threshold filtering for eliminating reads with zeros and very small counts.
100-1000 or more RPM (or CPM but I am not sure XD) threshold is considered for withing the cell miRNA capability to drive a significant response in mRNA targeting and repressing gene expression and not just slight and spurious reactions. It is a matter of selecting really interesting DE miRNAs for discussion rather than for statistical analyses, in which I already applied such pre-filtering anyway.
Following sentence was not clear
Gave an impression that you are looking for cutoff for expression and not function.
well, I assumed less than 100 CPMs to be a very low expression level so that the miRNA could not drive a significant function in repressing gene expression. If you have differential expression but with single or very scarce miRNA molecules, the effect is not expected to be such a thing to take into consideration. Maybe for profiling in cancer or so could be useful, but not for infering metabolic changes.
No CPM, aka TPM, and F(R)PKM are not the same!! I think it is time for some StatQuest. You may also want to google for blogs and videos of Lior Pachter, one of the leading RNA-seq analysis guys on that matter.
CPM is not TPM...
Question is the difference between counts per million and reads per million.
I'm not asking for RPKM, TPM or FPKM sorry...
Hi v82masae,
I have few queries regarding your miRNA requirement.
II did miRNA sequencing yes, with HiSeq 2500 Illumina, single-end.
I used Cutadapt for trimming adapters with an acceptance window of 15-30 nt (quite big, as reported windows are normally 18-25, but I wanted to retain big fragments for contrasting against circRNAs and precursor forms). I tried a smaller window and the output did not differ quite much, just a few thousand reads if not less.
After trimming, I aligned the reads against reference genome with about 80% mapping, and used FeatureCounts for counting reads in BAM files with an average 60% of successful assignment to GTF miRNA annotation file downloaded from miRBase v21 yes.
For differential expression I used DESeq2 and EdgeR, both giving quite similar results and kept DESeq2 just for department papers homogeneity as we are using DESeq2 for the majority of other papers from same project.
Thanks, V82masae.
Based on this paper, I used the mirna_soft tool (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4138881/).
Even we sequenced using Illumina, 51 bp single-end. After trimming, I retained the length between 18-35nt. Then I compared the sequenced reads against the miRBase v21 database. For Bovine, I extracted bovine specific miRNA from mature.fa file from miRBase v21.
Did you do alignment with your species miRNAs or reference genome?
Did you use bowtie2 aligner?
I got the output file with miRNA id, sequence, count of unique matches and count of multi-matches. For one of the samples, total trimmed reads- 4,679,140. Unique matches to the bovine miRNA sequences (from miRBase) was 800,000 matches. unmatched reads - 3,796,387
I did alignment from trimmed reads to reference genome yes. As my model organism (pig) had a very poorly annotated miRNAome, I also aligned my reads agains human and mouse genome and did all the pipeline to detect miRNAs differentially expressed that were not described in the miRNA GTD annotation in pigs, considering that within phylogenetically close species, miRNA sequences are commonly conserved.
I tried some de novo miRNA finding algorithms like mirDeep2, but the output did not convince me, too noisy. I opted for finding manually some DE non-annotated miRNAs in human and mouse and looking for their positions in pig assembly, then doing folding to see if secondary structure was equivalent to expected miRNA. Found 12 new miRNAs that way, 7 of them were in fact new miRNAs as they were included in the recently new pig assembly that was released one month ago, and the other 5 were compatible with fragments of circRNAs, but considered as miRNAs in human and mouse, so maybe some fragmentation of circRNA could be involved.
I used bowtie1, it is better suited for small reads than bowtie2, without considering any mismatches in seed (not taking into consideration ismoMIRs), maximum of 20 multimappings and --best --strata mode with one best alignment per read reported.
Hi V82masae,
I am working on both bovine and pig.
What do you think about this workflow, from miRBase I extracted bovine and pig related miRNAs from mature.fa (miRBase) file. Using miRNA_soft tool as per the research article I provided in the previous comment, I counted the reads mapped to bovine and pig miRNA sequences extracted from mature.fa. I got an output file with the following details:
miRNA id, miRNA sequence, Unique matches and Multi-matches.
I got similar output for all the samples. Then again using miRNA_soft tool, I compared the counts generated in the previous step to generate differential miRNA expression profiling.
I am a naive person in miRNA sequencing. Do you recommend this workflow?
What is miRNA GTD?
Can you share the pig reference genome genbank id or URL?
Why didn't you compare your sequenced reads with miRBase database instead you compared it with the pig, mouse and human reference genomes?