predicted protein and sequence
1
0
Entering edit mode
5.2 years ago
mdfardin374 ▴ 10

Hello,

Can somebody tell me what is predicted protein and how one can retrieve all the predicted proteins of a particular organism? Thank you

sequence • 1.1k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

What do you suppose a "predicted protein" might be? Given that its name is an exact description of what it is.

ADD REPLY
0
Entering edit mode
5.2 years ago
hugo.avila ▴ 530

The predicted proteins are the output of a program that scans all possible coding sequences (CDS) (codon Start to Stop) in a genome and assigns a name and function based on matches with an experimental protein sequence database. To search, you need to find a program that fits your needs (prokaryotes or eukaryotes). If you just want to extract the CDSs from your genome, a simple biopython script can do that. But if you want to assign name and function (annotation), you need to use a program. I work with prokaryotes, the best program for me is Prokka.

ADD COMMENT
0
Entering edit mode

I want to retrieve predicted protein (the protein that has no experimental evidence) already present in the database but I do not know how to retrieve it

ADD REPLY
0
Entering edit mode

For a specific organism or in general?

ADD REPLY
0
Entering edit mode

for specific organism

ADD REPLY
0
Entering edit mode

If a genome page is available for this organism at NCBI then follow these directions (this is a random example genome).

  1. Find genome page of your organism at NCBI. I am choosing this one.
  2. Find the protein link for the genome (should be up near top of the page).
  3. Find predicted protein entries from the file.

    $ zmore GCF_000209225.1_ASM20922v1_protein.faa.gz | grep "predicted"

    XP_001623027.1 predicted protein [Nematostella vectensis] XP_001623028.1 predicted protein [Nematostella vectensis] XP_001623029.1 predicted protein [Nematostella vectensis] XP_001623030.1 predicted protein [Nematostella vectensis]


4. Extract the sequence for those ID's using faSomeRecords utility from Jim Kent at UCSC.

ADD REPLY
0
Entering edit mode

From which database do you want to extract these protein sequences ? In most databases you can search for CDS titled "Hypothetical Protein". These CDS are predicted sequences that show no homology to a known sequence.

ADD REPLY

Login before adding your answer.

Traffic: 2140 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6