I am doing the de novo assembly analysis, and I want to annotate the sequences using NR database and Uniprot database, but after annotation, for the same sequence, I found that it had different annotation meaning in NR and Uniprot database.
Like this:
If we annotate TRINITY_DN129938_c2_g1_i6 sequence, it's annotation from NR database is pol-like protein [Tetraodon nigroviridis], but it's annotation from Uniprot database is Transposon TX1 uncharacterized 149 kDa protein OS=Xenopus laevis OX=8355 PE=4 SV=1
And this happened to many sequences.. If you know what's the problem and which annotation should I believe..
This is what I have tried based on your input:
I ran blast on Uniprot sequence of Transposon TX1 uncharacterized 149 kDa protein OS=Xenopus laevis OX=8355 PE=4 SV=1 against NR database on NCBI.
This is the result I got
Where You can see the very first hit is same but very next hit to it is what you got in your annotation from NR db.
Similarly one I did it vice versa taking pol-like protien as input for BLAST against uniprot database
this is what I got
Again very first hit is what you got from your Uniprot search
Explanation: Uniprot contain reviewed proteins data where your search space is limited to only those proteins which are validated experimentally where pol-like protein might not be even available for search . When you ran a search against NR database, there was this " Transposon TX1 uncharacterized 149 kDa protein OS=Xenopus laevis OX=8355 PE=4 SV=1" protein available for search but its length is quiet long (almost 3 times of pol protein), therefore due to less query coverage of your protein might became hurdle in overall bit score, but in case of pol like protein, due to small length bit score might be high. That's why your query kept it pol like protein on higher preference rather thanTX1 protein.
Thank you so much!! pretty clear, and if for the publication, if these are important sequences, what I should rely on is the uniprot database, right..? I can not use both annotations since it is different ....
I will not suggest you to consider uniprot result in this kind of case. Because if you think logically, I am offering 2 types of different flavor of ice-cream to child, then he has a option to select which one he likes, but if I am offering a single flavor to this child then he doesn't really have a choice. Similarly NR database is a huge database of non-redundant proteins, where two flavors of ice-cream( pol-like protein and Transposon TX1) were available, but your tool (Blast) liked pol-like protein, but in case of uniprot you had only one flavor available Transposon TX1(as I said in previous post that it has a very limited dataset of reviewed protiens), then blast had no other choice thn to take it.
PS:Please upvote if you are satisfied with answer
Thank you so much!! pretty clear, and if for the publication, if these are important sequences, what I should rely on is the uniprot database, right..? I can not use both annotations since it is different ....
Thank you so much!!
I will not suggest you to consider uniprot result in this kind of case. Because if you think logically, I am offering 2 types of different flavor of ice-cream to child, then he has a option to select which one he likes, but if I am offering a single flavor to this child then he doesn't really have a choice. Similarly NR database is a huge database of non-redundant proteins, where two flavors of ice-cream( pol-like protein and Transposon TX1) were available, but your tool (Blast) liked pol-like protein, but in case of uniprot you had only one flavor available Transposon TX1(as I said in previous post that it has a very limited dataset of reviewed protiens), then blast had no other choice thn to take it. PS:Please upvote if you are satisfied with answer