Download Fasta.Txt Of A Protein From Pdb Using Biojava
2
0
Entering edit mode
11.8 years ago

Hi.. I wanted to download fasta.txt of a protein from the protein data bank using biojava. I am successfully able to download the pdb.gz file of the protein using biojava but unable to do so for fasta.txt. is there any method that biojava offers to do this? Please help... I am still a newbie to biojava.. Thank you in advance

fasta pdb java biojava • 4.5k views
ADD COMMENT
0
Entering edit mode

It can be easily achieved using biopython or pymol scripting.

ADD REPLY
0
Entering edit mode

@Pappu: i need it in java... all the rest of my code are in java... so i was looking for something in java for this too..

ADD REPLY
0
Entering edit mode

Did you consider Jython?

ADD REPLY
1
Entering edit mode

@Pappu: the link you sent me above says cannot be opened because it is a local file.

ADD REPLY
1
Entering edit mode

This command will work in linux shell i.e. in bash.

ADD REPLY
1
Entering edit mode
11.1 years ago
Hamish ★ 3.3k

The best way to do this will depend on exactly what you want to fetch...

If you want all the chain sequences described in the structure (both protein and nucleotide), then a simple fetch from one of the Worldwide Protein Data Bank (wwPDB) sites will do the job, e.g. to get the sequences for PDB:10MH:

Sample Java code for fetching data from a web site can be found with an Internet search. FWIW the site I usually use to remind myself of the method specifics is part of the EMBL-EBI's Web Services tutorials, for example using java.net http://www.ebi.ac.uk/Tools/webservices/tutorials/06_programming/java/rest/java.net

If you want the sequence for a specific chain from a structure, then the EMBL-EBI's dbfetch and WSDbfetch services are an option. These know how to deal with common forms of structure and chain identifiers and the case sensitivity rules for the chain identifier. For details of the supported PDB identifier formats see: http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/dbfetch.databases#pdb. They also support retrieval of all sequences for a structure, and if using the specific database names 'pdbaa' and 'pdbna' retrieval of only the protein or nucleotide sequences for a structure.

In BioJava you can also extract the ATOM and SEQRES chain sequences from the PDB structure, see:

ADD COMMENT

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6