Question about miRNA nomenclature
2
4
Entering edit mode
5.2 years ago
Vasu ▴ 790

Im working on some analysis with miRNA expression data and miRNA targets information from different sources like miRBase, Targetscan etc...

The names of miRNAs in the expression data and miRNAs targets information data need to be same for my analysis

But I see that in both of these data, names look different.

For eg: In miRNA expression data, there is a miRNA hsa-miR-373 and in miRNA target information I see there are hsa-miR-373-3p, hsa-miR-373-5p.

similarly, In expression data hsa-mir-101-1 and hsa-mir-101-2 and miRNA target information it is hsa-mir-101-3p

Question: Can I change the names of hsa-miR-373-3p, hsa-miR-373-5p into hsa-miR-373?

mirna mirnaseq mirbase • 2.4k views
ADD COMMENT
8
Entering edit mode
5.2 years ago

miRNA nomenclature can be a bit confusing. The direct answer to your question is yes somtimes, but not always, and not without checking.

miRNAs are expressed as hairpins. Thus, the precursor hairpin to hsa-miR-373 looks like:

            -     g            uuuuug     |
5' gggauacuc aaaau ggggcgcuuucc      u    |
   ||||.||.| ||||| |..||.|||.||           | - This is pre-hsa-miR-373
3' cccuguggg uuuua cuucgugaaggg      c    |
            g     g            ucaugu     |

This is processed to make a double stranded RNA, which contains two short RNAs that could potentially act as miRNAs:

    acuc-aaaaugggggcgcuuucc     <---- This is hsa-miR-373-5p
    ||.| ||||| |..||||||.
  ugugggguuuuagcuucgugaag       <---- This is hsa-miR-373-3p

For most RNAs one of these arms is unstable and is quickly degraded, so it doesn't get chance to have any activity. The other is stable and get incorporated into RISC - the enzyme that does the business for miRNAs. In the case of hsa-miR-373 the 3p arm is stable and the 5p arm is unstable. Thus the 3p arm is also known as hsa-miR-373 and the 5p arm is also known as hsa-miR-373. You can't treat hsa-miR-373-5p/miR-373 as the same as miR-373, because it has an entirely different sequence, and would have different targets if it were ever stable enough to get incorporated in to RISC. The information on which arm is which is in mirbase. There are some miRNAs however, where both arms are stable and must be treated as independent miRNAs.

The next complication is that some miRNAs appear more than once in the genome. hsa-miR-101 is an example of this. hsa-miR101-1 is located on chromosome 1, and hsa-miR-101-2 is located on chromosome 9. But both encode miRNA precursors that have identical 3p arms. That is the sequence of the 3p mature miRNA product produced from both the hsa-miR-101-1 and hsa-miR-101-2 loci is uacaguacugugauaacugaa, and this is the stable arm. If you sequence this, you have no idea whether it came from miR-101-1 or miR-101-2. However, the unstable 5p arms are different for miR-101-1 and miR-101-2. So officially there are 3 mature miR-101 RNAs in humans - miR-101-1-5p, miR-101-2-5p and miR-101-3p, expressed from two loci. In this case you can safely treat miR-101-3p as miR-101 irrespective of which locus it came from. Sometimes both arms are identical. Again, the information on which mature products are the same between the two copies, and which is the stable and unstable arm, is on mirbase.

ADD COMMENT
0
Entering edit mode

Thank you very much for the clarification.

ADD REPLY
0
Entering edit mode

Small question. I got the target information from mirTarbase for humans. Almost more than 80% miRNAs are with 5p annd 3p. And the miRNA expression data I downloaded from TCGA Breast which has 1880 miRNAs. In this not even one miRNA is with 5p and 3p.

The miRNA names in downloaded TCGA Breast expression data are like below:

hsa-mir-105-1, hsa-mir-1180, hsa-let-7a-3, hsa-mir-106a, hsa-mir-133a-1, hsa-mir-548aw

What I should do now? How to match the names in both the datas. Do I need to check all of this manually? or Is there a quick way to match the names in both datas?

ADD REPLY
1
Entering edit mode

WIthout looking into the details of the TCGA small RNA analysis pipeline, its difficult to say. Probably they are using the predominant arm (I don't know what they are doing where two arms are equivalent). I'm pretty sure this information can be downloaded from miBase - there is even be a BioConductor R package for mirbase. SO you could just guess that the dominant arm is the one TCGA is reffering to. The real solution would be to track down the source of annotation in the TCGA analysis pipeline.

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode

Ok. I guess its better to download mirbase21.isoforms.quantification.txt data from TCGA which has information about Accession. And then I also used miRBaseConverter R package to get the information about mature regions and Accession. This gave all 3p and 5p information. Then I did sum the counts of multiple duplicate miRNAs.

ADD REPLY
2
Entering edit mode
5.2 years ago
Buffo ★ 2.4k

No, 5p and 3p are different miRNAs, they come from the same precursor and are partially complementary in sequence but they may have completely different targets. Personally I recommend you not mix data from different databases, miRBase, mirpath from DianaTools and mirdeep2 works fine for me in hsa data.

ADD COMMENT
0
Entering edit mode

sure. thanks a lot for the information

ADD REPLY

Login before adding your answer.

Traffic: 1812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6