Finding Pdb Codes Of Sequences Homologous To An Input Sequences
2
0
Entering edit mode
11.2 years ago
amenabanu • 0

I need to write a script that can do a blastp with an input xml file, then find and display the PDB codes of sequences that are homologous. The definition I'm using for homology is: 1. the HSP in the alignment with the highest e-value is less than .001; and 2. the total alignment length (not counting gaps) is 60% or more of the query length.

I'm still new to programming, and I'm quite stumped as to how to do this.

I'd really appreciate your help. Thanks!

python blast pdb • 3.2k views
ADD COMMENT
0
0
Entering edit mode

Thanks for responding! I am looking through Biopython. Would you be able to point me to something specific?

ADD REPLY
0
Entering edit mode

I just did :) I linked you to the tutorial on the section to parse BLAST XML results. Here are two more links, google search for 'parse blast xml biopython':

http://stackoverflow.com/questions/13835912/parse-only-top-3-hits-from-blast-output-with-ncbixml http://www.biotnet.org/sites/biotnet.org/files/documents/25/biopython_blast.pdf

ADD REPLY
0
Entering edit mode
11.2 years ago
Hamish ★ 3.3k

Due to the PDB chain identifiers being case sensitive, and NCBI BLAST+ not supporting case sensitive identifiers, I would tend to use one of the programs from the FASTA suite for this. Since the PDB sequence database is small, this also means that more rigorous search methods can be used, for example:

  • SSEARCH: for local/local alignment (Smith-Waterman)
  • GGSEARCH: for global/global alignment (Needleman-Wunsch)
  • GLSEARCH: for global/local alignment

These programs and the PDB sequence databases are available in the EMBL-EBI Sequence Similarity Search services. Since these are also available as Web Services and EMBL-EBI provide sample clients for these services, you can download a suitable client from the EMBL-EBI Web Services pages and run the search providing the appropriate parameters, for example:

wget http://www.ebi.ac.uk/Tools/webservices/download_clients/perl/lwp/fasta_lwp.pl
./fasta_lwp.pl --email email@example.org --program ssearch --database pdb --stype protein --expupperlim 0.01 --sequence mysequence.txt

Once you have the results then you can use libraries such as BioPerl, BioPython or BioRuby to parse the FASTA format results, or use the other formats provided with your own parsing/extraction code to implement your coverage filter. Typically I would use the "EBI Application XML" result with some XSLT and/or XPath to extract the required information from the result.

ADD COMMENT

Login before adding your answer.

Traffic: 4097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6