Getting gene sequence from ncbi BioJava
1
0
Entering edit mode
10.1 years ago
Bioaln ▴ 360

Hello. I've been dealing with sequence parsing lately and I can't seem to download a gene sequence from NCBI. My previous code returns me the gene name (for example TGFB1). So again, what I am trying to accomplish here is use java code to fetch gene sequence (I've been tying with geneRICH class in BioJava but it doesn't seem to have that option, only accession number and genbank id).

Thanks for any help.

identifiers biojava • 3.2k views
ADD COMMENT
1
Entering edit mode
10.1 years ago

You need to find the sequences associated to this gene (e.g : refseq sequences) using NCBI utilities. e.g: Get Fasta File With Protein Sequences Given Entrez Gene Ids

Furthermore, Biojava is not really needed to fetch the sequence. You can use xjc to generate the classes

xjc -dtd "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd"
parsing a schema...
compiling a schema...
generated/ObjectFactory.java
generated/TSeq.java
generated/TSeqSeqtype.java
generated/TSeqSet.java

and use those classes to parse a ncbi EUtilities efetch URL

See http://plindenbaum.blogspot.fr/2006/12/java-16-mustang-jaxb-and.html (old!)

ADD COMMENT
0
Entering edit mode

Wow, thanks for the thorough answer. I will look into those possibilities.

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6