miRNA nomenclature can be a bit confusing. The direct answer to your question is yes somtimes, but not always, and not without checking.
miRNAs are expressed as hairpins. Thus, the precursor hairpin to hsa-miR-373 looks like:
- g uuuuug |
5' gggauacuc aaaau ggggcgcuuucc u |
||||.||.| ||||| |..||.|||.|| | - This is pre-hsa-miR-373
3' cccuguggg uuuua cuucgugaaggg c |
g g ucaugu |
This is processed to make a double stranded RNA, which contains two short RNAs that could potentially act as miRNAs:
acuc-aaaaugggggcgcuuucc <---- This is hsa-miR-373-5p
||.| ||||| |..||||||.
ugugggguuuuagcuucgugaag <---- This is hsa-miR-373-3p
For most RNAs one of these arms is unstable and is quickly degraded, so it doesn't get chance to have any activity. The other is stable and get incorporated into RISC - the enzyme that does the business for miRNAs. In the case of hsa-miR-373 the 3p arm is stable and the 5p arm is unstable. Thus the 3p arm is also known as hsa-miR-373 and the 5p arm is also known as hsa-miR-373. You can't treat hsa-miR-373-5p/miR-373 as the same as miR-373, because it has an entirely different sequence, and would have different targets if it were ever stable enough to get incorporated in to RISC. The information on which arm is which is in mirbase. There are some miRNAs however, where both arms are stable and must be treated as independent miRNAs.
The next complication is that some miRNAs appear more than once in the genome. hsa-miR-101 is an example of this. hsa-miR101-1 is located on chromosome 1, and hsa-miR-101-2 is located on chromosome 9. But both encode miRNA precursors that have identical 3p arms. That is the sequence of the 3p mature miRNA product produced from both the hsa-miR-101-1 and hsa-miR-101-2 loci is uacaguacugugauaacugaa
, and this is the stable arm. If you sequence this, you have no idea whether it came from miR-101-1 or miR-101-2. However, the unstable 5p arms are different for miR-101-1 and miR-101-2. So officially there are 3 mature miR-101 RNAs in humans - miR-101-1-5p, miR-101-2-5p and miR-101-3p, expressed from two loci. In this case you can safely treat miR-101-3p as miR-101 irrespective of which locus it came from. Sometimes both arms are identical. Again, the information on which mature products are the same between the two copies, and which is the stable and unstable arm, is on mirbase.
Thank you very much for the clarification.
Small question. I got the target information from mirTarbase for humans. Almost more than 80% miRNAs are with 5p annd 3p. And the miRNA expression data I downloaded from TCGA Breast which has 1880 miRNAs. In this not even one miRNA is with 5p and 3p.
The miRNA names in downloaded TCGA Breast expression data are like below:
What I should do now? How to match the names in both the datas. Do I need to check all of this manually? or Is there a quick way to match the names in both datas?
WIthout looking into the details of the TCGA small RNA analysis pipeline, its difficult to say. Probably they are using the predominant arm (I don't know what they are doing where two arms are equivalent). I'm pretty sure this information can be downloaded from miBase - there is even be a BioConductor R package for mirbase. SO you could just guess that the dominant arm is the one TCGA is reffering to. The real solution would be to track down the source of annotation in the TCGA analysis pipeline.
TCGA miRNA pipeline.
Comes from BCGSC's miRNA pipeline.
Ok. I guess its better to download
mirbase21.isoforms.quantification.txt
data from TCGA which has information about Accession. And then I also usedmiRBaseConverter
R package to get the information about mature regions and Accession. This gave all 3p and 5p information. Then I did sum the counts of multiple duplicate miRNAs.