Question

miRNA annotation from miRBase 21

0

Entering edit mode

10.2 years ago

bioinforupesh2009.au ▴ 140

Dear all,

I just checked miRBase 21 has released and try to look their annotation .gff for quantification. I noticed, in gff has more counts (mature) than miRNA reported in site. How and why, I do not understand!

counts shown in site : Mus musculus (1193 precursors, 1915 mature) [GRCm38]

counts from gff annotation: Mus musculus (1128 precursors, 2046 mature)

Why? I have fear that it can affect in differential expression analysis. Because gff counts are not equal to real miRNA from the site. Is it?

Please someone describe it, and tell me if I should worry about it. Either I use genome or miRBase reference for mapping.

Hope anyone has noticed it before

miRBase gene-annotation next-gen • 6.0k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 10.2 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

Hello bioinforupesh2009.au!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=47133

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 10.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I am sorry for this inconvenience. What should i do ??? i have to delete one of them ???

ADD REPLY • link 10.2 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

hi, maybe help if you tell exactly where those numbers come from, like from gff. number of lines matching the word miRNA in column 3? and the others numbers?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Lorena Pantano ▴ 380

0

Entering edit mode

thanks Lorena for your kind reply,

yes you are right this is the counts of 3rd column of gff3 file for mmu which is I guess mature miRNA.

like miRNA counts from 3rd column of gff3 file from miRBase is : Mus musculus ( 2046 mature)

primary transcript (i guess, this is precursors ) from gff3 file from miRBase is: Mus musculus (1128 precursors)

From miRBase, you can find here: http://www.mirbase.org/cgi-bin/browse.pl?org=mmu

am I correct now ? help me please !!!

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

Hi,

for mature miRNA, remove duplicates, since many miRNA have multiple primary transcripts. so if you do this to that gff3 (i downloaded yesterday):

awk '$3=="miRNA"' mmu.gff3 | sed 's/;/ /g' | cut -f 3 -d " " | sort -u | wc -l
    1907

awk '$3=="miRNA_primary_transcript"' mmu.gff3 | sed 's/;/ /g' | cut -f 3 -d " " | sort -u | wc -l
    1187

It is not exactly the same number, but it is closed to. If you are still worry about that, you can ask mirbase people.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Lorena Pantano ▴ 380

0

Entering edit mode

Thank you very much for your kind help..... got it and I also mailed to miRBase but he couldn't response yet. lets see

By the way...... I saw your impressive profile and got to know that you have a great experience in RNA biology...

could you please tell me that, is it worth to counts only unique mature RNA as you have suggested to extract it or I have to count transcript ID? For example (for zebrafish, indeed I am working on two sps ), dre-let-7a have 6 transcript

chr11   .       miRNA   28380129        28380150        .       -       .       ID=MIMAT0001759 Name=dre-let-7a
chr15   .       miRNA   20399528        20399549        .       +       .       ID=MIMAT0001759_1       Name=dre-let-7a
chr23   .       miRNA   5478545 5478566 .       -       .       ID=MIMAT0001759_2       Name=dre-let-7a
chr4    .       miRNA   17722407        17722428        .       -       .       ID=MIMAT0001759_3       Name=dre-let-7a
chr5    .       miRNA   31628956        31628977        .       +       .       ID=MIMAT0001759_4       Name=dre-let-7a
chr6    .       miRNA   54461695        54461716        .       -       .       ID=MIMAT0001759_5       Name=dre-let-7a

and if I consider only mature then it come with 6 counts and ofcourse if I count ID then must be have 6 diff counts of correspondence transcript. I can say this is happens because of gene duplication, right ?? So what should I consider, miRNA or transcript ??

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by bioinforupesh2009.au ▴ 140

0

Entering edit mode

Hi,

will try to help! :)

you should count what ever maps there only once. If you have a read that map in more than on precursor, but the mature miRNA is the same, you should count this only once.

I would use programs to do this for me, like mirdeep or mine: miraligner. The main difference between these two, is that miraligner gives information about isomirs.

hope this help!

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by Lorena Pantano ▴ 380

0

Entering edit mode

Thank you very much for your kind suggestion, I would sure try this. However, I suppose to use reference genome rather than precursor, in order to align my short reads and further use R to detect DE miRNAs and by using miRDeep2 only to predict Novel miRNAs. is this fine ??? by this pipeline ??

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by bioinforupesh2009.au ▴ 140