Question

Download Fasta.Txt Of A Protein From Pdb Using Biojava

0

Entering edit mode

12.6 years ago

potassiumiodide0990 ▴ 80

Hi.. I wanted to download fasta.txt of a protein from the protein data bank using biojava. I am successfully able to download the pdb.gz file of the protein using biojava but unable to do so for fasta.txt. is there any method that biojava offers to do this? Please help... I am still a newbie to biojava.. Thank you in advance

fasta pdb java biojava • 5.0k views

ADD COMMENT • link updated 11.8 years ago by Hamish ★ 3.3k • written 12.6 years ago by potassiumiodide0990 ▴ 80

0

Entering edit mode

It can be easily achieved using biopython or pymol scripting.

ADD REPLY • link 12.6 years ago by Pappu ★ 2.1k

0

Entering edit mode

@Pappu: i need it in java... all the rest of my code are in java... so i was looking for something in java for this too..

ADD REPLY • link 12.6 years ago by potassiumiodide0990 ▴ 80

0

Entering edit mode

Did you consider Jython?

ADD REPLY • link 12.6 years ago by Pappu ★ 2.1k

2

Entering edit mode

wget 'http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetText.pl?pdb=2vtp&seq_fasta=1' -O 2vtp.fa

ADD REPLY • link 12.6 years ago by Pappu ★ 2.1k

1

Entering edit mode

@Pappu: the link you sent me above says cannot be opened because it is a local file.

ADD REPLY • link 12.6 years ago by potassiumiodide0990 ▴ 80

1

Entering edit mode

This command will work in linux shell i.e. in bash.

ADD REPLY • link 12.6 years ago by Pappu ★ 2.1k

score 1 · Answer 1 · 2013-10-06

The best way to do this will depend on exactly what you want to fetch...

If you want all the chain sequences described in the structure (both protein and nucleotide), then a simple fetch from one of the Worldwide Protein Data Bank (wwPDB) sites will do the job, e.g. to get the sequences for PDB:10MH:

Sample Java code for fetching data from a web site can be found with an Internet search. FWIW the site I usually use to remind myself of the method specifics is part of the EMBL-EBI's Web Services tutorials, for example using java.net http://www.ebi.ac.uk/Tools/webservices/tutorials/06_programming/java/rest/java.net

If you want the sequence for a specific chain from a structure, then the EMBL-EBI's dbfetch and WSDbfetch services are an option. These know how to deal with common forms of structure and chain identifiers and the case sensitivity rules for the chain identifier. For details of the supported PDB identifier formats see: http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases#pdb. They also support retrieval of all sequences for a structure, and if using the specific database names 'pdbaa' and 'pdbna' retrieval of only the protein or nucleotide sequences for a structure.

In BioJava you can also extract the ATOM and SEQRES chain sequences from the PDB structure, see: