As far as I know, DNA sequencing coverage is simply calculated as (read count x read length / genome size). This means, for example, that an experiment might be described as "40x coverage".
For RNA-seq, my initial instinct would have been to calculate a "coverage" figure in much the same way, but dividing by the number of annotated bases rather than full genome length. When I think about it, however, this figure seems fairly meaningless for RNA-seq. From what I understand, DNA sequencing coverage is fairly uniform across the genome (telomeres and repeat regions excepted), so the simple coverage figure is informative - you can be reasonably sure that a typical base in a "40x coverage" genome will be represented by about 40 reads.
For RNA-seq, however, the variation in transcript levels surely makes this figure meaningless? A coverage of "10x" would tell you next to nothing about the number of reads supporting a typical base.
So, my question is - is there any other metric that people would use to convey the strength/coverage of their data set? Would "read count" still be the best available? Has anybody used anything like the read count associated with a housekeeping gene(s) to compare coverage?
Thanks.
No, I am talking about DEseq, but there exists also a package called DEGseq which leads to confusion sometimes. As far as I understood, DEGseq uses an MA-plot based method and the samr algorithm. Here is the DEGseq application note for completeness: http://www.ncbi.nlm.nih.gov/pubmed/19855105
Michael, thank you for sharing your slides. When I saw "DEseq" in your answer, I thought it was a typo. But after seeing "DEseq" again in your slides, I was confirmed that you made a tiny mistake. The package name should be "DEGseq".