Hello, I am working on Cut&Tag data that includes spike-in for normalization. The spike-in is the Ampr gene from the plasmid pBluescript(+). I have 4 samples, Wt[1,2], and Ko[1,2], all of which were generated using same antibody. What I have done so far for each sample is:
Aligned my reads to mm10 (Bowtie2)
Aligned my reads to Ampr gene from pBluescript (Bowtie2).
I want to normalize these 4 samples using scaling factors calculated using the Spike-in data. I was wondering how to go about doing that? From my research, I found that normalization factors are calculated using the following:
normalization factor = lowest_sample (spike-in) /sample_of_interest (mm10) (https://www.biostars.org/p/247172/),
where the lowest sample is the sample with the lowest Spike-in counts and the sample_of_interest is the counts of each sample.
In this hypothetical example below, if each of these are counts from bowtie2 (PE uniquely aligned), then would the scaling method A make sense? or should I use method B or neither?
mm10 Spike-in Scaling Factor *A* OR Scaling Factor *B*
Wt1 70 5 5/5 = 1 70/5
Wt2 80 7 5/7 = 0.7 80/7
Ko1 30 6 5/6 = 0.8 30/6
Ko2 40 6 5/6 = 0.8 40/6
I would greatly appreciate advice on whether my current idea for normalization is correct or not. If not, could you point me in the right direction?
Is there a way to use deeptools to do this?
Any help will be greatly appreciated.
This is more a comment than a question, but I never really got why use of spike-ins in routine experiments would be meaningful. You add a constant amount of spike to each library, but if signal-to-noise ratio is different between libraries (in ChIP/CUT applications very common) this essentially is a normalization per library size and therefore unreliable. I would just call peaks on samples, make a count matrix on merged peaks and then use DESeq2 or edgeR to get proper size factors. ATAC-seq sample normalization
But maybe I simply do not understand the idea of spike-ins.