Hello,
Can somebody tell me what is predicted protein and how one can retrieve all the predicted proteins of a particular organism? Thank you
Hello,
Can somebody tell me what is predicted protein and how one can retrieve all the predicted proteins of a particular organism? Thank you
The predicted proteins are the output of a program that scans all possible coding sequences (CDS) (codon Start to Stop) in a genome and assigns a name and function based on matches with an experimental protein sequence database. To search, you need to find a program that fits your needs (prokaryotes or eukaryotes). If you just want to extract the CDSs from your genome, a simple biopython script can do that. But if you want to assign name and function (annotation), you need to use a program. I work with prokaryotes, the best program for me is Prokka.
If a genome page is available for this organism at NCBI then follow these directions (this is a random example genome).
Find predicted protein
entries from the file.
$ zmore GCF_000209225.1_ASM20922v1_protein.faa.gz | grep "predicted"
XP_001623027.1 predicted protein [Nematostella vectensis] XP_001623028.1 predicted protein [Nematostella vectensis] XP_001623029.1 predicted protein [Nematostella vectensis] XP_001623030.1 predicted protein [Nematostella vectensis]
4. Extract the sequence for those ID's using faSomeRecords
utility from Jim Kent at UCSC.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Maybe this post helps C: Which software is suitable for protein prediction from whole eukaryotic genomes?
In combination with this website:
https://en.wikipedia.org/wiki/Gene_prediction
What do you suppose a "predicted protein" might be? Given that its name is an exact description of what it is.