Dear all,
which database is commonly used to submit bacterial genomes? I have a genome in the form of one fasta file consisting of ~150 seqences (Illumina MiSeq). Some of these sequences are less than 200 nucleotides long. These are mostly homopolymeric DNA stretches. When trying to submit to the ncbi database, I cannot complete the process because sequences <200 nucleotides are not allowed. If I just delete the short sequences, don't I distort the data? As you can easily recognize, this is my first time to submit a genome. In the ncbi manual, it is not stated how to deal with short sequences. Could anyone please tell me how I should continue and why?
Thank you very much!
Note that submitting low-quality data to public databases makes life harder in perpetuity for everyone using those databases. Please put a lot of effort into curating the data yourself prior to submission to ensure that the genomes are pure (uncontaminated), represent the correct species, and are as complete and contiguous as possible. NCBI has some automated checks to prevent low-quality submissions from degrading the databases, as you can see, but they are not foolproof. I suggest you study the matter a bit more before submitting anything.