Genbank,Ncbi Refseq Or Uniprot Protein Sequences
4
7
Entering edit mode
13.3 years ago
Woa ★ 2.9k

I have to construct a protein database of a sequenced organism for a proteomics search. Protein sequences from which repositories out of Genbank, NCBI Refseq and UniprotKB will be better for this purpose?

Thanks

WoA

proteomics protein refseq uniprot • 17k views
ADD COMMENT
7
Entering edit mode
13.3 years ago

UniprotKB contains the most rich, accurate, high-quality data. Genbank contains raw data, it could be very redundant, and you might have to do a lot of filtering yourself. Refseq is not so richly annotated, but at least it's only non-redundant sequences.

So my first choice would be to go with UniprotKB, second RefSeq, and third Genbank. But it also depends on whether the organism you're interested in has sufficient data in each resource.

Would you care to share which organism you're interested in?

ADD COMMENT
0
Entering edit mode

Many Thanks !!! Can somebody tell me what is the difference between Uniprot "Complete Proteome set" and the combined reviewed (UniProtKB/Swiss-Prot) and unreviewed (UniProtKB/TrEMBL) entries

For some organisms the difference is negligible but for others, so far I've seen the difference is by around 100 entries.

ADD REPLY
4
Entering edit mode
13.3 years ago

You can find the answer to your second question: "what is the difference between Uniprot "Complete Proteome set" and the combined reviewed (UniProtKB/Swiss-Prot) and unreviewed (UniProtKB/TrEMBL) entries?" on the [?]UniProt Homepage[?]:

  • Swiss-Prot, which is manually annotated and reviewed.
  • TrEMBL, which is automatically annotated and is not reviewed.

UniProt really is a combination of two resources: SwissProt and trEMBL.

SwissProt is a high quality, because highly curated, real protein database. In fact it is one of the oldest databases we have and it is maintained by real protein experts.

trEMBL on the other hand is not a database of real proteins at all. It is a database of translated nucleotide sequences from EMBL (hence trEMBL). These can very well not-exist in real biology or just be wrongly translated (miss an exon or whatever). The two were combined for practical reasons but it is very good to be aware of the difference.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. I think I should fetch "Complete Proteome set" whenever availble for the organism. However The "complete proteome" contains only the canonical sequences and not all splice-variants. Is there any way to get all the splice variants ?

ADD REPLY
0
Entering edit mode

When you go to download the FASTA (assuming that is what you are using), e.g. http://www.uniprot.org/uniprot/?query=organism%3a9606+keyword%3a181&format=*, you get a choice to download the canonical sequence data, or canonical and isoform sequence data. The latter presumably includes splice variants as separate protein entries.

ADD REPLY
2
Entering edit mode
13.3 years ago

What I would like to see is data that can link to mRNA isoforms. RefSeq allows this. GenBank would be noisy as Martijn says. The mRNA isoforms can be important because they are expressed to different levels according to cell type, temporal patterns (circadian, developmental), and responses to stimuli. These points could be quite critical to the design of the experiment whose data you'll now analyze or critical to the hypotheses addressed.

ADD COMMENT
0
Entering edit mode

Thanks!!I'll look into it

ADD REPLY
0
Entering edit mode
13.2 years ago
Craig ▴ 30

For mass spectrometry–based proteomics, the International Protein Index (IPI, http://www.ebi.ac.uk/IPI/IPIhelp.html) has been a popular choice for common organisms. For some reason they don't have yeast but Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/) fills in nicely there. However, IPI is closing soon, and they recommend UniProt complete proteome sets (http://www.uniprot.org/faq/15) as a replacement. Overall, UniProt seems to provide good information for pretty much any organism, even if it doesn't have a complete proteome set yet, and it is definitely the most extensive, so I would recommend just going there for everything.

ADD COMMENT
0
Entering edit mode

Thanks Craig, can you tell me where NCBI NR stands compared to, say Uniprot? Is it less annonated and more redundant(even though they call it NR)?

ADD REPLY
0
Entering edit mode

Unfortunately I have never used NR so I can't answer this question.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6