Maybe a simple question but for which I am unable to find documentation on NCBI, by their support, or here at BioStar thus far.
Use case: I currently have 30+ draft genomes/large-plasmids (scaffolded contigs) and want to submit these to NCBI to get accession numbers for publication.
Approach: The new guidelines make it obligatory to register a bioproject first, then make for each sample/genome a Biosample entry and subsequently link a WGS submission to that. Clear. I created a bioproject, I batch uploaded the biosample collection using their template. No problem. Bioproject and biosamples succesfully registered.
Problem: Using their WGS submission pipeline online I have to supply fasta files of all contigs (not a problem) and an AGP-v2 file describing the scaffolding (not a problem I generated those). BUT, I seem to have to manually upload each of these files for every single biosample one-by-one. Not that bad, but I also need to provide 6 pages full of selection boxes or extensive author/publication information for EACH biosample one-at-a-time. Makes me sad and is error-prone.
Question: How can I batch upload these samples and get them linked to my biosamples properly? All WGS linked data are more or less the same (publication, type, authors, .....). In the future we plan to do 100+ of these...
Update 2014-nov-28: Didn't get a real answer from NCBI... For my plasmid sequences they pushed me now to use the bankit tool. Just generate the scaffold and insert 100 Ns between. Update the fasta headers to match biosample numbers and upload the fasta..... we will see how it goes from there..... No solution as of yet for my bacterial genome scaffolds...
Hi, ALchEmiXt, Do you have a solution for the WGS genome batch submission? Now I have more than 400 genomes for submission. The bioproject and biosample IDs are ready, just face the same situation as yours. Many thanks. Best,
Actually I don't have a batch solution yet. At that time we solved it by going through all steps by hand in batches. At least the involved PhD did... Have to refresh this since we will be facing another batch of 1000+ genomes this year. Heard the EBI tools might be more flexible but didn't check it out yet. How did you solve it?