There are a number of biostars posts on how to calculate coverage and read depth, and what they mean. I'm still confused.
This is how I currently understand things:
Depth (or Read Depth) at a BP coordinate
The number of "hits" on that coordinate resulting from alignment. In other words, the number of aligned reads that land on that coordinate. The height of the bar over that position in a genome browser.
Coverage
The sum of all the depths across a particular coordinate range such as a gene or a peak.
Total Read Depth
The sum of all the depths across a particular coordinate range such as a gene or a peak. This is NOT exactly the same as
[read length] × [num reads, i.e. fastq lines ÷ 4] ÷ [length of reference in bp]
because not all reads in the fastq will get aligned or completely aligned.
Total Coverage
Same as Total Read Depth
Average Read Depth
[Total Read Depth] ÷ [number of base pairs (coordinates) in aligned region or reference]
Example:
For example, I want to calculate the differential binding between chipseq samples. This is roughly
[coverage under peak in treatment 1] ÷ [coverage under peak in treatment 2]
However, I should really normalize by the average read depth of each treatment. So
[coverage under peak in treatment 1 ÷ average read depth in treatment 1 bam]
÷
[coverage under peak in treatment 2 ÷ average read depth in treatment 2 bam]
Is this correct?
There are no widely accepted definitions. When you see "depth" or "coverage" mentioned in a paper, you almost always need to figure out its exact meaning from the context.
NOOOOO! That is exactly what I did not want to hear :(
I typically use the words like this:
The depth of an experiment is the total number of reads obtained from the sequencer. More stringend, you can also say that the effective depth is the number of reads that remains after all the filtering, so duplicate reads, low-quality, proper pairs for paired-end sequencing etc. Therefore, sequencing deeper means to sequence your samples on a larger flow cell or a platform that can produce more reads, e.g. Nextseq instead of Miseq.
The coverage is the number of reads that covers a certain region or nucleotide.
Still, as Heng pointed out, there is no real uniform definiton, so whenever you discuss or give a talk, reserve one slide to explain the vocabulary to make sure that everyone is on the same page and can follow properly.
This one should probably be changed to be a forum post rather than a question.