Question

NR annotation and Uniprot annotation

0

Entering edit mode

5.4 years ago

mxlsherry1992 ▴ 80

Hi all,

I am doing the de novo assembly analysis, and I want to annotate the sequences using NR database and Uniprot database, but after annotation, for the same sequence, I found that it had different annotation meaning in NR and Uniprot database.

Like this: If we annotate TRINITY_DN129938_c2_g1_i6 sequence, it's annotation from NR database is pol-like protein [Tetraodon nigroviridis], but it's annotation from Uniprot database is Transposon TX1 uncharacterized 149 kDa protein OS=Xenopus laevis OX=8355 PE=4 SV=1

And this happened to many sequences.. If you know what's the problem and which annotation should I believe..

Thanks

rna-seq • 3.3k views

ADD COMMENT • link updated 5.4 years ago by prince26121991 ▴ 70 • written 5.4 years ago by mxlsherry1992 ▴ 80

score 0 · Answer 1 · 2019-12-10

0

Entering edit mode

5.4 years ago

prince26121991 ▴ 70

This is what I have tried based on your input: I ran blast on Uniprot sequence of Transposon TX1 uncharacterized 149 kDa protein OS=Xenopus laevis OX=8355 PE=4 SV=1 against NR database on NCBI. This is the result I got enter image description here

Where You can see the very first hit is same but very next hit to it is what you got in your annotation from NR db.

Similarly one I did it vice versa taking pol-like protien as input for BLAST against uniprot database this is what I got

enter image description here

Again very first hit is what you got from your Uniprot search

Explanation: Uniprot contain reviewed proteins data where your search space is limited to only those proteins which are validated experimentally where pol-like protein might not be even available for search . When you ran a search against NR database, there was this " Transposon TX1 uncharacterized 149 kDa protein OS=Xenopus laevis OX=8355 PE=4 SV=1" protein available for search but its length is quiet long (almost 3 times of pol protein), therefore due to less query coverage of your protein might became hurdle in overall bit score, but in case of pol like protein, due to small length bit score might be high. That's why your query kept it pol like protein on higher preference rather thanTX1 protein.

ADD COMMENT • link 5.4 years ago by prince26121991 ▴ 70

0

Entering edit mode

Thank you so much!! pretty clear, and if for the publication, if these are important sequences, what I should rely on is the uniprot database, right..? I can not use both annotations since it is different ....

Thank you so much!!

ADD REPLY • link 5.4 years ago by mxlsherry1992 ▴ 80

1

Entering edit mode

I will not suggest you to consider uniprot result in this kind of case. Because if you think logically, I am offering 2 types of different flavor of ice-cream to child, then he has a option to select which one he likes, but if I am offering a single flavor to this child then he doesn't really have a choice. Similarly NR database is a huge database of non-redundant proteins, where two flavors of ice-cream( pol-like protein and Transposon TX1) were available, but your tool (Blast) liked pol-like protein, but in case of uniprot you had only one flavor available Transposon TX1(as I said in previous post that it has a very limited dataset of reviewed protiens), then blast had no other choice thn to take it. PS:Please upvote if you are satisfied with answer

ADD REPLY • link 5.4 years ago by prince26121991 ▴ 70