Phylogenetic Issue - a problem regarding which protein entries sampling from Genbank
1
0
Entering edit mode
10.2 years ago

Dear all,

I would like to have the opinion of the community about a problem I'm facing. How to reconstruct phylogeny based on protein sequence of plant gene family. To this aim, one should retrieve all possible protein entries related to this family on Genbank.

Unfortunately as you probably know many of the protein sequences in GenBank (at the NCBI) are result of conceptual translations. Therefore they are predicted or hypothetical.

My aim is to infer the correct phylogeny without false positive/negative results, as well as not incurring mis-alignments due to incorrect predictions.

Which workflow/strategy would you recommend to choose ?

Thank you so much,

Luca

alignment sequence gene • 2.7k views
ADD COMMENT
0
Entering edit mode

Look at ensembl plants for orthologs.

ADD REPLY
2
Entering edit mode
10.2 years ago
cdsouthan ★ 1.9k

You can choose plants for which the proteomes are complete (or at least close to it) in Swiss-Prot

ADD COMMENT
0
Entering edit mode

It's not that simple. Even if proteomes are not complete, a protein in particular can have been already identified or characterized.

ADD REPLY
0
Entering edit mode

Since many plants do not yet have complete proteomes, this would be somewhat limiting. As such finding as wide a range of family members as possible including taxa without complete proteomes it a reasonable thing to do in the first instance.

As a first pass searching UniProtKB/Swiss-Prot using either:

And limiting the result based on the Taxonomy annotations will give a set of possible candidates. This set can then be filtered based on the protein existence annotation. This will give a set of proteins that you can be reasonably sure actually exist in vivo. From there generating a phylogeny should be relatively simple.

ADD REPLY

Login before adding your answer.

Traffic: 2935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6