I'm trying to understand how to work with microrna data and I am having some struggles with the nomenclature.
If I understood correctly from this Biostars post:
hsa-miR-373-3p and hsa-miR-373-5p have to be considered like different miRNAs because they have different sequences and consequently different targets.
Instead, hsa-miR-101-1 and hsa-miR-101-2 have the same target; but come from different loci.
If I have an expression value for hsa-miR-101-1 and one for hsa-miR-101-2, and I want to see how much their target gene is regulated, can I sum the two expression values and consider them only once?
What if I have two miRNAs like hsa-miR-190a and hsa-miR-190b? From this article (NewsMedicalLifeSciences), it seems like they are different miRNAs but of the same family. What does that mean? Do they target the same genes?
If I have data from different sources how can I handle merging them?
If one source only has hsa-miR-101 and the other hsa-miR-101-1 and -2? Is it referring to both?
What if one source only has hsa-miR-373 and the other hsa-miR-373-3p and -5p?
It cannot be referring to both, but, from what I understood, one is usually more stable than the other. I guess it could be referring to that one. Where would I find the information about which arm to consider?
Thank you very much for the answer.
I will have to look into the pipeline used to get this data to understand if I should add the values I have or not.
So if I look at these (mirBase) the numbers I need to understand the dominant arm are the ones in the "Deep Sequencing" field right? The 3p has 635 reads and the 5p only 14 so the 3p is the dominant one if I understood.
Yes. In this case the 3p arm will be the dominant one. ALso if you look under the record for "Mature Sequence for hsa-miR-373-3p" you'll see that the "previous ID" is miR-373, whereas if you look under "Mature sequence for hsa-miR-373-5p" that the "Previous ID" is miR-373*. The stars were used in an old naming scheme to refer to the non-dominant arm. This was retired when it became clear that it wasn't that uncommon for both arms to be stable, hence the -3p and -5p terminology.
Weird, I thought I saw the * in both "previous ID". I guess it was late, and my eyes wanted nothing to do with work.
Also, I forgot to ask in the original post how to handle microRNAs in the same family. If I have miR-190 in one file and miR-190a/b in another, I can now see from mirBase that the -5p arm is the dominant one for both. But is there a dominant member of the family? Do I still look at the deep sequencing read count and decide that since the 190a-5p has 19957 reads and the 190b only 5587 that the data I have in the first file without additional information is for 190a-5p? This is especially problematic with miRNAs like hsa-let-7 for which I have a file that has data from 7a to 7i.
Lastly, I just came across hsa-mir-548, which seems to have very weird nomenclature. There is 548a, then there isn't 548b but 548aa, then ab and so on. Do you know what's up with that?
Hi, Sorry.
For miR-190 i'd look to see if I could work out which one was discovered first. Its likely that whereever that data/analysis pipeline is coming from, it was created before the second example was found. I won't use the read counts in miRBase for that, they are tied to whatever tissue the sequencing has been done in, which could be different in your tissue. (its fine for which arm, as this is thought to be stable between tissues).
No idea what is going on with miR-548, sorry.
No need to be sorry you were very helpful. Thanks again