Question

Help creating .sqn file using tbl2asn to submit multiple sequences to Genbank

0

Entering edit mode

11.0 years ago

jolespin ▴ 150

I think this post is very relevant to many bioinformaticians who are submitting to Genbank using tbl2asn

I've been following the guidelines here

I've successfully installed tbl2asn on my Mac and have been using through the terminal

The directions say to create 3 files: template.sbt, table.tbl, and fasta.fsa

My fasta format headers look like:

>TCONS_00001810 [organism=Mus musculus] [strain=C57BL/6J] [chromosome=1] olfactory receptor 1415 (Olfr1415) mRNA, complete cds

The corresponding data in the table file looks like:

>Feature TCONS_00001810
1    3422    mRNA
1    186    5'UTR
187    1122    CDS
1123    3422    3'UTR
1    176    exon
177    2079    exon

The template file isn't a text file so I can't provide an example . . .

In terminal, I've opened up tbl2asn and I know it's working because when I do the command:

tbl2asn -

it gives me all of the different commands that I can use.

When I run this command, it works and creates a file in my directory with the template.sbt, table.tbl, and fasta.fsa called errorsummary.val. However, this file is empty (zero bytes). It should create a .sqn file which combines the 3 preliminary files i've described earlier.

tbl2asn -t template.sbt -p . -j "[organism=Mus musculus] [strain=C57BL:6J]" -V vb -a s

The documentation explains -t, -p, -j, -V, and -a

-p specifies the path for the table and sequence files [required]
-t specifies the template file (including the path) [required]
-j allows the addition of source qualifiers that will be the same for each submission
Example: -j "[organism=Saccharomyces cerevisiae] [strain=S288C]"
-V is a verification command when used in conjunction with v (strongly suggested), which will tell the computer to run a validation step to insure that there are no errors in your submission.

This validation step will generate a report (with suffix .val) for each .fsa file and place it in the same directory that houses the data files and tables used in the submission.

If you add a b command (optional) following the v command, the computer will generate a GenBank flat file (.gbf) of your submission and deposit it in the same directory that houses the data files and tables used in the submission. Note that .gbf files are not suitable for submission. They are only to view the file in GenBank flatfile format. The -a command used in conjunction with the s command instructs tbl2asn to read multiple FASTA components in one file as a set of unrelated sequences. This creates a single file of multiple submissions.

Why does the program run, not give any errors, create the errorsummary.val and not create the .sqn file?

How can I get this to work? I feel like I'm very close.

I've already established a working directory which is where all of those files are located.

I've tried to put the directory location after -p, taking out the -j and modifiers, and [optional] commands. Still can't get it to work.

Please help. This should be useful for anyone submitting multiple sequences at a time to Genbank

submit batch tbl2asn sequence genbank • 11k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by jolespin ▴ 150

0

Entering edit mode

What are the contents of the errorsummary.val file? is it empty or does it give any information about what's going on

ADD REPLY • link 11.0 years ago by cts ★ 1.7k

0

Entering edit mode

its completely empty . do you have any idea how to submit this to Genbank? What other reasons could there be for why it isn't working .

ADD REPLY • link 11.0 years ago by jolespin ▴ 150

0

Entering edit mode

hello,

I am trying to prepare files for a TSA submission. I prepared the .fsa and .sbt files. I was wondering if you know how to prepare the .tbl file that contains the annotation?

Thanks in advance

Federico

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.5 years ago by federico.gaiti ▴ 70

0

Entering edit mode

Also, I'm too getting an empty .val file

ADD REPLY • link 10.5 years ago by federico.gaiti ▴ 70

0

Entering edit mode

Did you ever resolve this issue? I'm having the same problem with a whole genome dataset, and I can't figure out what's going wrong.

ADD REPLY • link 10.0 years ago by sorrywm ▴ 10

Ram · Answer 1 · 2014-05-09

Hi,I am also having a problem creating an sqn file for Genbank submission.

I am attempting to submit a full genome to Genbank. I am using tbl2asn to generate an sqn file for submission from the velvet contigs.fa but I am running into two difficulties.

The command I am using looks like this:

./mac.tbl2asn -i 1383.fsa -t 1383.sbt -j "[organism=Cronobacter sakazakii 1] [strain=1] [host=unknown] [country=UK] [collection_date=1950] [isolation-source=milk powder] [note=multilocus sequence type 4] [gcode=11]" -M n -Z discrep -a r10k

The raw velvet output generates no output whatsoever.

Believing this might be related to the FASTA headers I wrote a script to replace this with ">contig0001" etc. (and to filter out scaffolds <200nt). The sqn file is now generated but looks like this:

inst {
repr delta ,
mol dna ,
length 525416 ,
ext
 delta {
   literal {
     length 355416,
     seq-data
       ncbi2na 'CA5E5949BD0DE2538D01DF43F02FE39D020F01F6C383C6E
F477E21F8C39FFDB7E61B808CD2E558B951123EDE303EF0224B986697925E7B662BA6CCF19FD77
F48B42773F89FF77D215867982E3DBC996ED8E8A64F32A25E2223A426B0CE0000E1859D97FE16F
197BFF566FF8E978A4BDE429CF49152D259FFD67DF7BC5AFC5AC64524666F5C5EA69A69A5E4CDF
79FCD1514CC2099D337338232D199E2349395A79AFC692D1277A6019771659F5A3AE68430C8C5B
5236196635949852D4D4F6B4ABF407ADF4B0125462631787AF2F1491099B254D97D27464193630
6536301D01EB63F4E04C16E3613C32E5366B6200325A87E3D94522C75E230BCE5972CD93BF57F8

and so on...

Any ideas?

score 0 · Answer 2 · 2016-11-11

0

Entering edit mode

8.5 years ago

lizardburns • 0

This was probably down to the way the input files were named.

tbl2asn wants the file extension to be .fsa and won't find .fasta files.

ADD COMMENT • link 8.5 years ago by lizardburns • 0