Hi, i anticipate I am completely new with sequencing and bioinformatics in general. I have results from next generation sequencing on bacteria genomes I sent off for sequencing. The bactyeria are antibioitc resistant and they should carry their resistance genes on plasmids.
The first thing that is not clear is the concept of coverage. Can you clarify me this concept in the sequencing?I found the definition of :
"Coverage (read depth or depth) is the average number of reads representing a given nucleotide in the reconstructed sequence" . I can't figure out. Is this the number of times a same nucleotide appears in the reads??Can you give me some practical examples and explain me in simple words?
I am trying to visualize the data with Artemis. Please, can you help me with the interpretation of data? When I open a gbk file, on the bottom I found a list of lines. I suppose those are contigs? the first line is reported as "source" . the others lines are reported as "CDS". What could the term source refer? Is it a single contig? why is it not reported as "CDS", similarly to the others?All these CDS I obtain, therefore, if I understood correctly, are different pieces of the sequence ...aren't they?
My bacteria should have resistance genes in its plasmids(beta-lactamases), as obtained by PCR. Anyway I tried to open the "CDS genes and product" in Artemis that gives (in a text file) all the genes codifying the proteins found. Nevertheless, I can't find any CDS for the beta-lactamases I am looking for ...why does it happen? How can I retrieve the sequence of them? I am sure they are in the sequence of the bacterium..Please, HELP!
That's a lot of questions. Try to break your posts down to smaller bits, that makes it easier to give answers and keeping bits of information logically structured. As such you could also have a more informative title.
Your questions are rather diverse. Don't get me wrong, but if you don't understand the concept of coverage you probably should try to identify antibiotics resistance genes. Take it one step at a time!
A gbk file is a genbank file, of which you can probably find the specifications online.
Do you have someone in your institution who can put you on the right track? We don't mind helping, but you need some support!
I'll try to explain the concept of coverage, although the definition seems pretty clear (to me). The coverage is the number of reads which align to a certain position (nucleotide) in the reference genome and as such this tells you how many times this nucleotide/position has been "seen" by the sequencer and as such (indirectly) how much evidence there is for the presence of this nucleotide at that position. The higher the coverage, the more confident you can be about a variant call at that position. (Although a very very high coverage is also suspicious for certain (mapping) artifacts)