I have a number of samples that were sequenced (human cells) under some different conditions. All samples received the same 10% spike-in of mouse chromatin. I need to normalize coverage of each sample based on the number of mouse reads after sequencing.
I posted a similar but more complicated question a few days ago here, but I want to ask a more simplified question here that may actually get a response. As a basic example, let's say I have 4 samples:
Sample | Mouse reads
-------------------------------------
1 | 1.02 million
2 | 0.78 million
3 | 1.01 million
4 | 0.60 million
1) What are your ideas to scale each sample to the same level of mouse reads?
2) Does it make sense to first scale to RPM of human reads before scaling to mouse, or only scale based on mouse reads?
Thanks, seems like a simple solution. Will try when I get a chance.
I need to use the bed-file mode so I can find coverage of specific genomic regions. Would I need to use the
multiBamSummary --outRawCounts
in order to load it into R? Or do you think I can use the .npz output?Yes, definitely use the
--outRawCounts
option. You can load the .npz file, but you need special packages installed and it's generally more hassle than it's worth.