I have the miRNA and it's regulatroy target information from TargetScan. all the miRNA IDs (miRBase IDs) from there are belong to mature miRNA form.
For my analysis I need also include the expresion levels of these miRNAs. I]m using TCGA data, but the problem is that, all miRNA IDs in TCGA data samples from RNASeq are for miRNA precursor.
now my questions are :
- Is the expression value of miRNA precursore is equivalent to its corresponding mature miRNA form?
- What should I do with the miRNA which represent the stem loop?
For example in TargetScan I have : hsa-miR-18a
but in TCGA miRNA expression file I have :
hsa-mir-18a : precursore
hsa-mir-18a-1 : stem loop
hsa-mir-18a-2 : stem loop
basically miRNA IDs without capital R in it represent miRNA precursore.
We can go further - the (lack of) capitalisation of "mir" tells us we're talking about the miRNA precursor (here)
what is the alternative solution for this scenario ?
I think that the TCGA file and miRBase might have different naming conventions, that is TCGA does not seem to follow the "R" syntax. This seems to be the source of confusion.
In your OP you have precursor and two mature products for mir-18a (the "stem loop" products). These would correspond to your miR-18a entries in miRBase. I'd use BLAT or something to map the TCGA IDs with the miRBase IDs.
Thanks. you mean the hsa-mir-18a-1 and hsa-mir-18a-2 which are stem loop are mature product and correspond to miR-18a in TargetScan?
That is what I think is going on, but it'd be better to check with some sequence alignment just to make sure.
I checked, it's not. e.g; for
hsa-mir-3179
in TargetScan there are :hsa-mir-3179-1
,hsa-mir-3179-2
,hsa-mir-3179-3
in TCGA, which I checked in miRBase, they are stem loop which are precursoresMaybe you're not mapping your reads against the correct database? If I understand correctly TCGA is genome sequences, so it will provide you with miRNA genes (which are transcribed into miRNA precursors), while miRBase will provide you with predicted/validated mature miRNAs:
I think TCGA would be better if you wanted to see if there are mutations in known miRNAs, while miRBase is better for determining which mature miRNAs you have. Generally the reads in miRNA-Seq will be longer than the mature miRNA, so the references you align/map/count your reads against matter. I think this is what Chirag was getting at.
Basically, I'm building miRNA-target regulatory network. for this reason, I need the expression level of miRNA from TCGA. but the IDs in TCGA for miRNA are miRBase IDs, and the same from TargetScan, but I have the problem which I mentioned it in OP.