I all, I am new in analysing sequencing results coming from ngs and I sent some clinical bacterial isolates (they are likely to have plasmids)to be sequenced with Illumina. I anticipate that I am completely new with the terminology , methodology and everything else..I got files named "reads" and files"contigs" from the company. so, if I understood correctly the contig files are the reads assembled, right? and I shouldn't need to assemble on my own if they are already assembled by Illumina, right?Correct me if I go wrong, please What if I wanted the "finished version" of a genome(I mean the chromosome and the plasmids separate and ready to be deposited..)? should I assemble the contigs all together?.. and how do you do it?
Also, could you have more contigs with the same sequence?could it be a result of the overlapping methodology performed by the sequencing?
I also noticed that if I run Blast by using a sequence of a known protein as a query against a file containing all the contigs, the known sequence matches more contigs, and most often the same sequence in different contigs can be different in a few nucleotides that result in a different identity percentage with the known sequence...why does it happen? if there are more contigs for a same sequence, should not this latter be exactly in the different contigs? Is this due to the sequencing methodology?
Sorry for my questions..I am completely new with terminology, methodology etc..and I have no one to ask at the moment.
Thank you so much again.
Silvia
If you want to have help here, I think it is better if you ask only one question and not so many. Anyway, you can get a look at this: Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data, Completing bacterial genome assemblies: strategy and performance comparisons