Question

miRNA nomenclature and how to handle miRNAs

0

Entering edit mode

2.9 years ago

bio_elle ▴ 10

I'm trying to understand how to work with microrna data and I am having some struggles with the nomenclature.

If I understood correctly from this Biostars post:

hsa-miR-373-3p and hsa-miR-373-5p have to be considered like different miRNAs because they have different sequences and consequently different targets.

Instead, hsa-miR-101-1 and hsa-miR-101-2 have the same target; but come from different loci.

If I have an expression value for hsa-miR-101-1 and one for hsa-miR-101-2, and I want to see how much their target gene is regulated, can I sum the two expression values and consider them only once?

What if I have two miRNAs like hsa-miR-190a and hsa-miR-190b? From this article (NewsMedicalLifeSciences), it seems like they are different miRNAs but of the same family. What does that mean? Do they target the same genes?

If I have data from different sources how can I handle merging them?

If one source only has hsa-miR-101 and the other hsa-miR-101-1 and -2? Is it referring to both?

What if one source only has hsa-miR-373 and the other hsa-miR-373-3p and -5p?

It cannot be referring to both, but, from what I understood, one is usually more stable than the other. I guess it could be referring to that one. Where would I find the information about which arm to consider?

mirna microrna • 1.7k views

ADD COMMENT • link 2.8 years ago by bio_elle ▴ 10

score 2 · Accepted Answer · 2022-01-21

2

Entering edit mode

2.9 years ago

i.sudbery 20k

The mature products of the dominant arm (3p) for hsa-miR-101-1 and hsa-miR-101-2 are identical. Thus, if you had separate quantifications for each, in theory you should be able to sum them. However, as their sequences are identical, its difficult to know how you could have separate quantifications for each, unless the thing being quantified is not the canonical mature 3p arm. Quantification pipelines that consider both must be doing one of two things with reads that map to both (which should be the majority of reads): either they add a count to both, or the add a count to neither, or they add a count to one at random. If they add a count to neither, or they add a count to one at random, then you should be fine adding them together. However, if they add a count to both, then you should not add them together, but just take one or the other.

miRNAs that come from the same family produce mature sequences from their dominant arms that have the same seed sequence, but are not identical across the rest of their sequence. So, for example, hsa-miR-190a and 190b:

ugauauguuugauauauuaggu hsa-miR-190a-5p
|||||||||||||||.......
ugauauguuugauauuggguug hsa-miR-190b-5p
 ------ 6mer seed
------- 7mer-A1 seed
 ------- 7mer-m8 seed
-------- 8mer seed

We don't fully understand miRNA targeting, but we do know that the seed region (roughly speaking bases 2-8) are very important, however, the 3' end of the sequence can contribute to targeting sometimes in some yet to be understood circumstances. Thus miR-190a and miR-190b likely have similar, but not identical target sets. In fact, many target prediction algorithms might predict the same target sets (although I believe that recent versions of TargetScan do take the 3' end into account).

Just to complicate things the non-dominant arms may well have completely different seed sequences, and therefore target sets (as is the case for miR-190a/b).

You are correct that if a source has only miR-373 then it is only referring to one of the -5p or -3p arms, generally the more stable one. You can find this information at mirBase. The record for the precursor will show the balance of mature reads found for the -5p and -3p arms. The record for the mature sequence will have an field called "previous IDs" one of the two will generally be listed as miR-XXX without the -5p or -3p. This is usually the dominant arm.

ADD COMMENT • link 2.9 years ago by i.sudbery 20k

0

Entering edit mode

Thank you very much for the answer.

I will have to look into the pipeline used to get this data to understand if I should add the values I have or not.

So if I look at these (mirBase) the numbers I need to understand the dominant arm are the ones in the "Deep Sequencing" field right? The 3p has 635 reads and the 5p only 14 so the 3p is the dominant one if I understood.

ADD REPLY • link 2.9 years ago by bio_elle ▴ 10

1

Entering edit mode

Yes. In this case the 3p arm will be the dominant one. ALso if you look under the record for "Mature Sequence for hsa-miR-373-3p" you'll see that the "previous ID" is miR-373, whereas if you look under "Mature sequence for hsa-miR-373-5p" that the "Previous ID" is miR-373*. The stars were used in an old naming scheme to refer to the non-dominant arm. This was retired when it became clear that it wasn't that uncommon for both arms to be stable, hence the -3p and -5p terminology.

ADD REPLY • link 2.9 years ago by i.sudbery 20k

0

Entering edit mode

Weird, I thought I saw the * in both "previous ID". I guess it was late, and my eyes wanted nothing to do with work.

Also, I forgot to ask in the original post how to handle microRNAs in the same family. If I have miR-190 in one file and miR-190a/b in another, I can now see from mirBase that the -5p arm is the dominant one for both. But is there a dominant member of the family? Do I still look at the deep sequencing read count and decide that since the 190a-5p has 19957 reads and the 190b only 5587 that the data I have in the first file without additional information is for 190a-5p? This is especially problematic with miRNAs like hsa-let-7 for which I have a file that has data from 7a to 7i.

Lastly, I just came across hsa-mir-548, which seems to have very weird nomenclature. There is 548a, then there isn't 548b but 548aa, then ab and so on. Do you know what's up with that?

ADD REPLY • link 2.8 years ago by bio_elle ▴ 10

0

Entering edit mode

Hi, Sorry.

For miR-190 i'd look to see if I could work out which one was discovered first. Its likely that whereever that data/analysis pipeline is coming from, it was created before the second example was found. I won't use the read counts in miRBase for that, they are tied to whatever tissue the sequencing has been done in, which could be different in your tissue. (its fine for which arm, as this is thought to be stable between tissues).

No idea what is going on with miR-548, sorry.

ADD REPLY • link 2.8 years ago by i.sudbery 20k

0

Entering edit mode

No need to be sorry you were very helpful. Thanks again

ADD REPLY • link 2.8 years ago by bio_elle ▴ 10