I am running ChIPQC to assess some quality metrics of my obtained data. I am running into a bit of a discrepancy where QCmetrics() is reporting fewer reads within the bam file than are actually present. I am struggling to determine how ChIPQC is coming to the number that it is. If anyone has input regarding how this number is determined and why I am see the difference I am that would be extremely helpful.
P23M_VS_P23F_QC <- ChIPQC(P23M_VS_P23F, chromosomes = NULL)
QCmetrics(P23M_VS_P23F_QC)
Reads Map% Filt% Dup% ReadL FragL RelCC SSD RiP%
01_P23_Male_N1 36946798 100 8.75 0 74 206 2.100 0.883 18.0
02_P23_Male_N2 34428132 100 8.27 0 74 195 2.570 0.893 21.2
03_P23_Male_N3 34459791 100 7.85 0 74 203 2.650 0.848 19.6
04_P23_Female_N1 30773299 100 9.39 0 74 233 1.930 0.935 16.0
05_P23_Female_N2 32148841 100 8.44 0 74 203 2.680 0.864 18.0
06_P23_Female_N3 37475581 100 9.25 0 74 206 1.980 1.000 17.5
Input 33294824 100 9.01 0 74 155 0.713 1.650 NA
ChIPQC states that 01_P23_Male_N1 contains 36946798 reads in total. This is 2026282 fewer reads than reported by SAMtools if counting the read number. Many of these reads are unmapped. ChIPQC is suggesting all of the reads it is listing are mapped. As such, I would not expect these numbers to be the same, but still cannot determine where the 36946798 is coming from exactly.
samtools view -c "01_P23-Male-N1.bam"
38973080
Also tried the below knowing that ChIPQC drops reads for further analysis that do not have a Qscore of 15 or greater, but this was not super helpful.
samtools view -c -q 15 "01_P23-Male-N1.bam"
33716499
If anyone knows exactly how Reads is determined in this case or could suggest somewhere that I may have went wrong, I would greatly appreciate the help! Thanks!
Hi Istvan,
Thank you for all the recommendations and the provided explanation. I have combed through the ChIPQC vignettes and cannot find a clear explanation beyond "total number of reads in the bam file for each sample" (As you have hinted at above).
Running the suggested code results in the following outputs, which unfortunately still do not match 36946798 read count. This is probably not the biggest obstacle needing to be understood, but it is definitely a little disconcerting.
try this, if you used
bwa
for mapping it will filter out only the multi mapped reads:Mapping was performed using Bowtie2 if that adds anything to the determination of what SAMtools function to use.
The addition of the -q 0 yields the same result as the first two commands.