Is it better to annotate contigs or scaffolds
2
2
Entering edit mode
10.0 years ago
mgalactus ▴ 780

Hi,

I'm annotating some bacterial genomes, and I was wondering whether it makes more sense to annotate the contigs and then scaffold them or if it would have been ok to annotate the scaffolds. I'm planning to submit these genomes to NCBI, so it should comply with their standards as well.

Thanks

bacteria contigs scaffolds annotation • 6.4k views
ADD COMMENT
3
Entering edit mode
10.0 years ago
dago ★ 2.8k

I would say that annotating scaffolds makes much more sense. One scaffold can be done by many contings, and it could be that at the end of one conting you find a CDS broken in the middle or maybe a gene cluster broken in the middle. This can produce incorrect annotation or can give you a partial information on the gene order in the genome. Instead, likely, in the scaffolds this bias should be reduced.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx  -> scaffold
xxxxxx                     xxxxxxxxxxxxx 
          xxxxxxxxxxxx                          xxxxxxxxxxxxx -> contings
ADD COMMENT
1
Entering edit mode

Thanks for the reply: you are probably right, but do you have any information regarding if NCBI wants the scaffolding information to be given somehow?

ADD REPLY
2
Entering edit mode

Take a look here

ADD REPLY
1
Entering edit mode
10.0 years ago
HG ★ 1.2k

Please find an email response long back I got from NCBI

We do accept gapped submissions if N's represent gaps between ordered and oriented contiguous sequences. If you are using estimated gap sizes, then the number of N's should exactly match the estimated gap size. If you are unsure of the gap size, you should add 100 N's in the sequence file.

For more information on preparing a gapped submission please see http://www.ncbi.nlm.nih.gov/genbank/wgs_gapped

Please note we offer two submission pathways (Complete and WGS):

1. The genome assembly could be submitted as a complete genome if it falls into either of these cases:

a. You have sequenced the complete circular genome and there are no gaps
b. You know the order and orientation of the contigs and were able to assemble your sequences, with Ns between the contigs, into a single scaffold representing the circular genome with no extra unplaced contigs Genomes in the complete category should be submitted as .sqn files with or without annotation using GenomesMacroSend (http://www.ncbi.nlm.nih.gov/projects/GenomeSubmit/genome_submit.cgi) as described in http://www.ncbi.nlm.nih.gov/Genbank/genomesubmit.html.

2. If the genome assembly is in multiple pieces that you were unable to assemble into a complete chromosome, then submit the contigs to our Whole Genome Shotgun (WGS) database using the WGS submission portal (https://submit.ncbi.nlm.nih.gov/subs/wgs/). See the WGS page, http://www.ncbi.nlm.nih.gov/Genbank/wgs.submit.html for details.

Please contact us at genomes@ncbi.nlm.nih.gov if you have additional questions.

ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6