Read/Writer For Asn
1
1
Entering edit mode
13.4 years ago
Lee Katz ★ 3.2k

Hi everyone, is there a tool to make a submission template for tbl2asn, such that it would be easier to submit genomes? I have a list of authors' information which could hopefully be parsed and transferred to ASN format. For example:

author1="George Burdell|G.R.|Georgia Institute of Technology|Bio Lab|Atlanta|GA|United States|310 Ferst Dr NE|nobody@gatech.edu|1-404-385-5555|30332"

The definition of the Submission Template Format is difficult, but they give an example on their documentation page, at the very end:

ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/tbl2asn/DOCUMENTATION/tbl2asn.txt

*Edit* I am putting a bounty down for anyone who can find a framework for this, or if you can make a scalable script.

conversion ncbi genbank • 2.6k views
ADD COMMENT
1
Entering edit mode

Upon reflection, I am pretty sure this is what you mean. Have you considered using the NCBI C++ toolkit ? If you had that compiled, then I think it would be fairly straight-forward.

ADD REPLY
1
Entering edit mode

Well, I only know a C++ solution, so that isn't much help. I think NCBI should create a solution to this if they don't have one already. I would try writing to the help desk @ info@ncbi.nlm.nih.gov. Right now, I think they expect you to manually type these into Sequin and generate the ASN.1 that way ... but what if there are >20 authors on the paper, which is becoming more and more prevalent! Good luck!

ADD REPLY
0
Entering edit mode

It is not completely clear to me what you want: Do you want a template that already includes your author information ? I think that is what you mean. Perhaps you have a very long author list, and do not want to hand type these into a Submit-block ASN.1 format ?

ADD REPLY
0
Entering edit mode

I think I just want to convert a human-entered author list such as the pipe-delimited example above and turn it into the submission template file. I only understand Perl and PHP, and so I hope that this is covered by something other than the NCBI C++ toolkit, or that there is an already-compiled program.

ADD REPLY
0
Entering edit mode

Ok thanks. I'll put some kind of biostart-karma bounty on this. See if anyone wants to program it in bioperl or just perl.

ADD REPLY
0
Entering edit mode

Ok thanks. I think I know now that there is no scripting solution for this. I'll put a bounty for someone to make it in perl or bioperl.

ADD REPLY
0
Entering edit mode

I don't understand the problem. Why can't you "just" split your string and generate the file "SUBMISSION TEMPLATE FORMAT" given as an example at the end ? (PS: I' too lazy to read the whole tbl2asn.txt file )

ADD REPLY
0
Entering edit mode

The template format is a nested data file which would at least involve recursion, and I'm not a trained computer scientist. I've always been bad at recursion, but I feel like this could be a piece of cake to someone who has been classically trained.

ADD REPLY
1
Entering edit mode
13.3 years ago
Falstaff ▴ 30

This isn't an answer (yet), as it is an attempt to provide some background information and refine the input. The ASN.1 Specification for a Cit-sub looks like this:

Cit-sub ::= SEQUENCE {               -- citation for a direct submission
    authors Auth-list ,              -- not necessarily authors of the paper
    imp Imprint OPTIONAL ,           -- this only used to get date.. will go
    medium ENUMERATED {              -- medium of submission
        paper   (1) ,
        tape    (2) ,
        floppy  (3) ,
        email   (4) ,
        other   (255) } OPTIONAL ,
    date Date OPTIONAL ,              -- replaces imp, will become required
    descr VisibleString OPTIONAL }    -- description of changes for public view

Then the 'authors' field is this:

    -- Authorship Group
Auth-list ::= SEQUENCE {
        names CHOICE {
            std SEQUENCE OF Author ,          -- full citations
            ml SEQUENCE OF VisibleString ,    -- MEDLINE, semi-structured
            str SEQUENCE OF VisibleString } , -- free for all
        affil Affil OPTIONAL }                -- author affiliation

Thus, you are allowed multiple authors PER affiliation, and then multiple affiliations (Auth-list objects) per Cit-sub

So, would it be possible to re-organize the input data, with grouping by affiliations, like so:

Affiliation1="DATA | DATA | DATA | DATA"
AUHTOR1="INFO|INFO|INFO"
AUHTOR2="INFO|INFO|INFO"
AUHTOR3="INFO|INFO|INFO"
Affiliation2="DATA | DATA | DATA | DATA"
AUHTOR4="INFO|INFO|INFO"
AUHTOR5="INFO|INFO|INFO"
AUHTOR6="INFO|INFO|INFO"

Additionally, will you be providing input for who the 'contact' person is, perhaps with the first line having a CONTACT tag ? Or do you just want to hand edit that part.

Finally, would you reward your bounty for a linux or windows-compiled binary, so that the C++ library may be used, or must it be portable ?

ADD COMMENT
0
Entering edit mode

Multiple authors can have multiple affiliations and vice versa. I like your method of basically reducing the information to authors and affiliations. The corresponding author (ie point of contact) can either be explicitly set or in the absence of that setting, it can default to the first and last authors. The correct answer to this CS challenge will be some kind of reusable framework/module/class or reusable code. Optimally, something that fits into bioperl. Something that a bioinformatician with only a few years of programming experience can handle. Thank you!!

ADD REPLY

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6