Hi, has anyone manipulated asn.1 files downloaded from NCBI? i do understand that asn.1 is not ncbi specific and that there are many other formats i can download the data from NCBI. My question specifically relates to asn.1 and whether someone knows of a java library that you have used to parse these files. I have googled and many libraries speak of BER and DER encoding, which are communication specific formats. All i'm interested is in manipulating in java the ascii asn.1 files that one can download from ncbi.
The ASCII form of the NCBI ASN.1 data does not follow any standard and is essentially an invention by NCBI (that's why they also provide some converters). Only the binary form can be processed with generic ASN.1 tools. BER is the standard low-level encoding used in binary ASN.1 and not, contrary to what you wrote, anything communication-specific. The standard approach to parse binary ASN.1 data is to get the encoding definition file for a specific downloadable item (they are provided by NCBI for all their ASN.1 data, which many definition parts shared between databases), generate a parser, and link that to your application.
I have done that for PubChem ASN.1 compound, substance, and assay data. I have been using the SNACC parser generator to generate C code for linking (warning: there are some data item sequences where SNACC generates wrong code, trying to read an extra token from the input stream. You need to postprocess the generated parser source to fix that). The assay and structure readers are components of the generic academic version of the Cactvs Cheminformatics Toolkit www.xemistry.com/academic) Also note that a parser is surprisingly large and complex do to extensive inclusion of definitions from other NCBI branches grown over decades. For example, the literature reference definition part included by the PubChem assay and structure data is much more extensive, with dozens of different ways to specify even the most exotic type of reference, than the actual structure and assay data part.
Yeah, I found these converters after posting the question, however i prefer to be able to have the library that models a given file format since i have more control over the ways i can manipulate it. It is true i can use the converters to get say xml, but it still seems inefficient to me. For the time being i guess i will do it this way.
I suppose you googled for java-based ASN1 compilers/code-generators like http://sourceforge.net/projects/jac-asn1/ . As far as I remember I played with the NCBI ASNs but the time required to explore the solutions was not worth trying.
Yeah, I found these converters after posting the question, however i prefer to be able to have the library that models a given file format since i have more control over the ways i can manipulate it. It is true i can use the converters to get say xml, but it still seems inefficient to me. For the time being i guess i will do it this way.
I suppose you googled for java-based ASN1 compilers/code-generators like http://sourceforge.net/projects/jac-asn1/ . As far as I remember I played with the NCBI ASNs but the time required to explore the solutions was not worth trying.
Hmm strangely enough i didn't checked this one, thanks this is great.