Dear all,
I just checked miRBase 21 has released and try to look their annotation .gff for quantification. I noticed, in gff has more counts (mature) than miRNA reported in site. How and why, I do not understand!
counts shown in site : Mus musculus (1193 precursors, 1915 mature) [GRCm38]
counts from gff annotation: Mus musculus (1128 precursors, 2046 mature)
Why? I have fear that it can affect in differential expression analysis. Because gff counts are not equal to real miRNA from the site. Is it?
Please someone describe it, and tell me if I should worry about it. Either I use genome or miRBase reference for mapping.
Hope anyone has noticed it before
Hello bioinforupesh2009.au!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=47133
This is typically not recommended as it runs the risk of annoying people in both communities.
I am sorry for this inconvenience. What should i do ??? i have to delete one of them ???
hi, maybe help if you tell exactly where those numbers come from, like from gff. number of lines matching the word miRNA in column 3? and the others numbers?
thanks Lorena for your kind reply,
yes you are right this is the counts of 3rd column of gff3 file for mmu which is I guess mature miRNA.
like miRNA counts from 3rd column of gff3 file from miRBase is : Mus musculus ( 2046 mature)
primary transcript (i guess, this is precursors ) from gff3 file from miRBase is: Mus musculus (1128 precursors)
From miRBase, you can find here: http://www.mirbase.org/cgi-bin/browse.pl?org=mmu
am I correct now ? help me please !!!
Hi,
for mature miRNA, remove duplicates, since many miRNA have multiple primary transcripts. so if you do this to that gff3 (i downloaded yesterday):
It is not exactly the same number, but it is closed to. If you are still worry about that, you can ask mirbase people.
Thank you very much for your kind help..... got it and I also mailed to miRBase but he couldn't response yet. lets see
By the way...... I saw your impressive profile and got to know that you have a great experience in RNA biology...
could you please tell me that, is it worth to counts only unique mature RNA as you have suggested to extract it or I have to count transcript ID? For example (for zebrafish, indeed I am working on two sps ), dre-let-7a have 6 transcript
and if I consider only mature then it come with 6 counts and ofcourse if I count ID then must be have 6 diff counts of correspondence transcript. I can say this is happens because of gene duplication, right ?? So what should I consider, miRNA or transcript ??
Hi,
will try to help! :)
you should count what ever maps there only once. If you have a read that map in more than on precursor, but the mature miRNA is the same, you should count this only once.
I would use programs to do this for me, like mirdeep or mine: miraligner. The main difference between these two, is that miraligner gives information about isomirs.
hope this help!
Thank you very much for your kind suggestion, I would sure try this. However, I suppose to use reference genome rather than precursor, in order to align my short reads and further use R to detect DE miRNAs and by using miRDeep2 only to predict Novel miRNAs. is this fine ??? by this pipeline ??