This isn't an answer (yet), as it is an attempt to provide some background information and refine the input. The ASN.1 Specification for a Cit-sub looks like this:
Cit-sub ::= SEQUENCE { -- citation for a direct submission
authors Auth-list , -- not necessarily authors of the paper
imp Imprint OPTIONAL , -- this only used to get date.. will go
medium ENUMERATED { -- medium of submission
paper (1) ,
tape (2) ,
floppy (3) ,
email (4) ,
other (255) } OPTIONAL ,
date Date OPTIONAL , -- replaces imp, will become required
descr VisibleString OPTIONAL } -- description of changes for public view
Then the 'authors' field is this:
-- Authorship Group
Auth-list ::= SEQUENCE {
names CHOICE {
std SEQUENCE OF Author , -- full citations
ml SEQUENCE OF VisibleString , -- MEDLINE, semi-structured
str SEQUENCE OF VisibleString } , -- free for all
affil Affil OPTIONAL } -- author affiliation
Thus, you are allowed multiple authors PER affiliation, and then multiple affiliations (Auth-list objects) per Cit-sub
So, would it be possible to re-organize the input data, with grouping by affiliations, like so:
Affiliation1="DATA | DATA | DATA | DATA"
AUHTOR1="INFO|INFO|INFO"
AUHTOR2="INFO|INFO|INFO"
AUHTOR3="INFO|INFO|INFO"
Affiliation2="DATA | DATA | DATA | DATA"
AUHTOR4="INFO|INFO|INFO"
AUHTOR5="INFO|INFO|INFO"
AUHTOR6="INFO|INFO|INFO"
Additionally, will you be providing input for who the 'contact' person is, perhaps with the first line having a CONTACT tag ? Or do you just want to hand edit that part.
Finally, would you reward your bounty for a linux or windows-compiled binary, so that the C++ library may be used, or must it be portable ?
Upon reflection, I am pretty sure this is what you mean. Have you considered using the NCBI C++ toolkit ? If you had that compiled, then I think it would be fairly straight-forward.
Well, I only know a C++ solution, so that isn't much help. I think NCBI should create a solution to this if they don't have one already. I would try writing to the help desk @ info@ncbi.nlm.nih.gov. Right now, I think they expect you to manually type these into Sequin and generate the ASN.1 that way ... but what if there are >20 authors on the paper, which is becoming more and more prevalent! Good luck!
It is not completely clear to me what you want: Do you want a template that already includes your author information ? I think that is what you mean. Perhaps you have a very long author list, and do not want to hand type these into a Submit-block ASN.1 format ?
I think I just want to convert a human-entered author list such as the pipe-delimited example above and turn it into the submission template file. I only understand Perl and PHP, and so I hope that this is covered by something other than the NCBI C++ toolkit, or that there is an already-compiled program.
Ok thanks. I'll put some kind of biostart-karma bounty on this. See if anyone wants to program it in bioperl or just perl.
Ok thanks. I think I know now that there is no scripting solution for this. I'll put a bounty for someone to make it in perl or bioperl.
I don't understand the problem. Why can't you "just" split your string and generate the file "SUBMISSION TEMPLATE FORMAT" given as an example at the end ? (PS: I' too lazy to read the whole tbl2asn.txt file )
The template format is a nested data file which would at least involve recursion, and I'm not a trained computer scientist. I've always been bad at recursion, but I feel like this could be a piece of cake to someone who has been classically trained.