I am using UniProtKB to download protein sequences of Argonaute super-family (Query = Argonaute OR Piwi). The hits contain 194 UniProtKB/Swiss-Prot and 888 UniProtKB/TrEMBL entries.
On further analysis of these hits I find that UniProtKB/TrEMBL entries are redundant, on the other hand UniProtKB/Swiss-Prot gives one record per gene in one species.
I am in a dilemma as to which sequences/entries to consider from UniProtKB/TrEMBL for a particular protein from a specie, since there are multiple entries per gene for the same specie with different accession numbers.
For Ex. the protein Seawi from Strongylocentrotus purpuratus has only one gene but UniProtKB/TrEMBL lists 4 accessions (Q9GPA7, Q9GPA8, Q9GPA6, C9EID6) with varying sequence length.
There are large number of sequences which I will be missing out if I use only UniProtKB/Swiss-Prot sequences.
Kindly help me on this...
Thank you very much for the reply. It was of great help.
Glad to hear it. Feel free to vote for the answer then :-)