I have a new strain of Bacteria for which no reference genome is available at NCBI. The strain is sequenced using Illumina and the paired-end reads were obtained.
The pair-end reads were taken and subjected to De-novo assembly using CLC genomics workbench. Once the process of de-novo assembly was complete, the following results were obtained.
Name Consensus length Total read count Single reads Reads in pairs Average coverage
contig_1_mapping 30575 24185 749 23436 116.3326574
contig_2_mapping 4087 2703 111 2592 97.23562515
contig_3_mapping 5764 3835 187 3648 97.76040944
contig_4_mapping 7302 3488 152 3336 69.38989318
contig_5_mapping 26530 16489 553 15936 91.15295891
contig_6_mapping 5576 3883 119 3764 102.2159254
contig_7_mapping 42110 33820 1166 32654 117.8527666
Then contig measurements were also obtained like N75 N50 N25 Minimum Maximum Average Count Total
And the following contigs details
Name Modified Description Size Linear
contig_1 Mon May 15:35:29 IST 2017 Average coverage: 116.33 30575 Linear
contig_2 Mon May 15:35:29 IST 2017 Average coverage: 97.24 4087 Linear
contig_3 Mon May 15:35:29 IST 2017 Average coverage: 97.76 5764 Linear
contig_4 Mon May 15:35:29 IST 2017 Average coverage: 69.39 7302 Linear
contig_5 Mon May 15:35:29 IST 2017 Average coverage: 91.15 26530 Linear
contig_6 Mon May 15:35:29 IST 2017 Average coverage: 102.22 5576 Linear
Now how do I analyze and annotate these results which were obtained after doing de-novo assembly of the strain? Is there any NCBI tools to annotate these contigs?
What do you mean by 'analyse'? What do you want to know? h.mon's suggestions for annotation are probably the best 2 annotators currently.