Entering edit mode
8.7 years ago
colin.kern
★
1.1k
When people talk about the general quality if their ChIP-seq library, I often see them state something like "7.7% of the genome is enriched with H3K4me3". What is the standard way to calculate this value? Is it simply the genome coverage of all alignments of the treatment reads, or are they also considering the input/control in this value?
There is no way to know what someone did without seeing a log of the script/software/parameters run. "Standard methods" to do things change from lab to lab, year to year, project to project.
However, the method you proposed is pretty good. I'd call that coverage (not unique to chip), but some people like to call that depth. I think of depth as the mean number of reads that pileup above a base. But there's no formal definition of either, so that's caught me off-guard before. Do you compare against the input? Do you only include regions of the genome with more than X reads? Do you include contigs or just chromosomes? Do you include Y for females? Do you exclude your blacklisted regions, repeats, low-complexity region? Does it make any practical difference?
You might be interested in these guys: Signal Distribution Charts
Which is the same sort of idea, except its a distribution not an average. But the answer is probably do it however you like, just stay consistent. (and please log it)
You may want to checkout: http://www.ngs-qc.org/ They have vey nice ways to test antibody quality based on chip-seq data.