ChIPQC reporting fewer reads in bam than are actually present
1
0
Entering edit mode
3.7 years ago
gkunz ▴ 30

I am running ChIPQC to assess some quality metrics of my obtained data. I am running into a bit of a discrepancy where QCmetrics() is reporting fewer reads within the bam file than are actually present. I am struggling to determine how ChIPQC is coming to the number that it is. If anyone has input regarding how this number is determined and why I am see the difference I am that would be extremely helpful.

P23M_VS_P23F_QC <- ChIPQC(P23M_VS_P23F, chromosomes = NULL)

QCmetrics(P23M_VS_P23F_QC)
                    Reads Map% Filt% Dup% ReadL FragL RelCC   SSD RiP%
01_P23_Male_N1   36946798  100  8.75    0    74   206 2.100 0.883 18.0
02_P23_Male_N2   34428132  100  8.27    0    74   195 2.570 0.893 21.2
03_P23_Male_N3   34459791  100  7.85    0    74   203 2.650 0.848 19.6
04_P23_Female_N1 30773299  100  9.39    0    74   233 1.930 0.935 16.0
05_P23_Female_N2 32148841  100  8.44    0    74   203 2.680 0.864 18.0
06_P23_Female_N3 37475581  100  9.25    0    74   206 1.980 1.000 17.5
Input            33294824  100  9.01    0    74   155 0.713 1.650   NA

ChIPQC states that 01_P23_Male_N1 contains 36946798 reads in total. This is 2026282 fewer reads than reported by SAMtools if counting the read number. Many of these reads are unmapped. ChIPQC is suggesting all of the reads it is listing are mapped. As such, I would not expect these numbers to be the same, but still cannot determine where the 36946798 is coming from exactly.

samtools view -c "01_P23-Male-N1.bam"
38973080

Also tried the below knowing that ChIPQC drops reads for further analysis that do not have a Qscore of 15 or greater, but this was not super helpful.

samtools view -c -q 15 "01_P23-Male-N1.bam"
33716499

If anyone knows exactly how Reads is determined in this case or could suggest somewhere that I may have went wrong, I would greatly appreciate the help! Thanks!

R ChIPseq ChIPQC • 1.3k views
ADD COMMENT
1
Entering edit mode
3.7 years ago

Confusions over how various tools choose to count alignments are very common. The rationales are usually insufficiently documented and one is left to wonder of what tacit assumptions are present.

It usually comes down to how primary, secondary, and multi-mapped reads are counted.

# Remove unmapped reads, count mapped reads only.
samtools -F 4 -c data.bam

# Remove unmapped,secondary and supplementary alignments
samtools view -F 2308 -c data.bam

# What is the meaning of the flag
samtools flags 2308
0x904   2308    UNMAP,SECONDARY,SUPPLEMENTARY

# Apply additional filters
samtools view -q 15 -F 2308 -c databam

See what these numbers give you,

ADD COMMENT
0
Entering edit mode

Hi Istvan,

Thank you for all the recommendations and the provided explanation. I have combed through the ChIPQC vignettes and cannot find a clear explanation beyond "total number of reads in the bam file for each sample" (As you have hinted at above).

Running the suggested code results in the following outputs, which unfortunately still do not match 36946798 read count. This is probably not the biggest obstacle needing to be understood, but it is definitely a little disconcerting.

# Remove unmapped reads, count mapped reads only.
samtools -F 4 -c 01_P23-Male-N1.bam
36951573

# Remove unmapped,secondary and supplementary alignments
samtools view -F 2308 -c 01_P23-Male-N1.bam
36951573

# Apply additional filters
samtools view -q 15 -F 2308 -c 01_P23-Male-N1.bam
33716499
ADD REPLY
0
Entering edit mode

try this, if you used bwa for mapping it will filter out only the multi mapped reads:

samtools -F 4 -q 0 -c 01_P23-Male-N1.bam
ADD REPLY
0
Entering edit mode

Mapping was performed using Bowtie2 if that adds anything to the determination of what SAMtools function to use.

samtools view -F 4 -q 0 -c 01_P23-Male-N1.bam
36951573

The addition of the -q 0 yields the same result as the first two commands.

ADD REPLY

Login before adding your answer.

Traffic: 2032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6