Okay.
When you sequence, you make libraries by cutting of your genomic DNA into small pieces (say 500bp).
And then attach adapters to both the ends of your fragment (say 25bp)
Now this is your fragment, and its length would be (25 + 500 + 25) bp.
Insert would be the sequence between your adapters. Insert length/size would be 500bp.
Now for paired end sequencing, you take your fragment and sequence some 'n' bases (say 100bp) starting from both the ends. (this is your - read length 100bp) . Why only 100bp, why not whole fragment you ask? Its mainly because the DNA becomes wobbly as your read length increases and it gets harder to sequence. But new sequencing technologies are overcoming this which can achieve several kbs of read length (look at ion torrent minion). So coming back, you cover 100 bp from right and left end of your fragment. The uncovered or the middle part of ~350bp (550-200) is the inner mate.
Coverage and Depth are being used interchangeably. Coverage would be what percentage of your target sequence is being covered by sequencing. Say if your genome is 1000bp long and you sequence it - align it and open it in tablet/IGV and you see that there is an area of 100bp where there are no reads! That part is not covered by the sequencer for some reason. So your coverage would be 90% (geneome covered[900]/genome size[1000]).
Depth is how many reads are there at a given position.
Frequently you see in articles they mention the coverage in terms of 'X'. If they say 50X, that is on an average each base in target sequence is covered by 50 reads. One can calculate this X coverage by simple calculation:
(Number of mapped reads * read length) / genome size
Sincere apologies if anything is wrong here.
Source: Numerous Biostar and Seqanswers posts, this blog.
read the spec : http://samtools.github.io/hts-specs/SAMv1.pdf
Dear Pierre,.
Thank you so much I could find the website which was in bellow of your email ,but I have problem to know about exactly what is depth of coverage ? insert size?fragment length? properly and improperly paired ,,,,and so on in the bam files of tablet software I have to find SNPs in my sequenced data,
I would be so thankful if you help me,
Best Regards, Eli,
search + find : What Is The Sequencing 'Depth' ?
In addition to reading the SAM spec., also google around for terms like "coverage depth" (aka "depth of coverage"), "insert length" and so on.
But I do not understand :properly pair read (1/2) or 2/2 and improperly 1/2 or 2/2 ?
Google more (though the 1/2 and 2/2 have no meaning without context).
Yes I have in my tablet visualizer of illumina sequenced data these identification:
for example:
From: 2.261 to 2.400
Length: 140 U140 (140 mismatches)
Cigar: 140M
Improperly paired (1/2 ) ,insert size 163
Read direction is FORWARD
What does it mean ? or some times its written properly paired (2/2).Can you please make graphic example to simplified this description same your other nice explanation?
Ah, in that context, (1/2) just means read #1 from the pair (you have paired-end reads). There are any number of reasons that could be improperly paired. Perhaps the observed insert size is much shorter than expected. Perhaps the other read has the wrong orientation relative to read #1. It's tough to say without knowing what read #2 is.
So , IT depends on reading conditions,,
Butt I was thinking it s relevant to mate pair end and pair end reading which I also can not recognize difference between them, and cigar ? in top context for example?
The only way to tell the difference between mate-pair and paired-end is to look at read orientation and insert size. If two reads point toward each other and have a shortish insert size, then they aligned as paired-end. If they point away from each other and have a very large insert size, then they aligned as mate-pairs. Obviously telling an aligner that you have one type of data and feeding it the other will result in a lot of "improperly paired" reads. Keep in mind that it at least used to be the case the mate-pair libraries still contained a lot of paired.end reads (no clue if that's still the case).
For the CIGAR string, just read the SAM spec. It's mention in there. You really should be able to find most of this yourself via google.
Yes its trueI can find them but I just wanted to make sure with your explanation ,,,due to your correct description,
Thanks again,