MACS2 "total fragments in treatment" gives less number of fragments than expected
0
0
Entering edit mode
5.1 years ago

Hi all!

  • HTK4ME1-1_S2_sorted
    • 134765384 + 0 in total (QC-passed reads + QC-failed reads)
    • 0 + 0 secondary
    • 0 + 0 supplementary
    • 0 + 0 duplicates
    • 134765384 + 0 mapped (100.00%:N/A)
    • 134765384 + 0 paired in sequencing
    • 67386044 + 0 read1
    • 67379340 + 0 read2
    • 34355482 + 0 properly paired (99.70%:N/A)
    • 134765384 + 0 with itself and mate mapped
    • 0 + 0 singletons (0.00%:N/A)
    • 38967 + 0 with mate mapped to a different chr
    • 21345 + 0 with mate mapped to a different chr (mapQ>=5)

The above mentioned is the output of samtools flagstat. I have 134765384 - 38967= 134,726,417 properly mapped reads. I called the broad peak call in BAMPE mode and got .xls file. When I look at the .xls file I see the following

  • fragment size is determined as 209 bps
    • total fragments in treatment: 67177741
    • fragments after filtering in treatment: 959918
    • maximum duplicate fragments in treatment = 1
    • Redundant rate in treatment: 0.99
    • total fragments in control: 82366728
    • fragments after filtering in control: 12656817
    • maximum duplicate fragments in control = 1
    • Redundant rate in control: 0.85

The Total fragments in the treatment are 67177741, however, my no.s of mapping reads are 134,726,417. My queries are:

  1. I wish to reconfirm if fragments are considering both forward and reverse read since 67177741 x 2= 134,355,482‬ which is nearly equal to 134,726,417 reads. I, however, don't know what happened to rest 3,70,935 reads

  2. The fragments after filtering are way very less and about 66,217,823‬ fragments (66,217,823‬ x 2= 132,435,646 reads ) are getting filtered out. What could be the possible reason? I am not ready to accept that 132 million reads are duplicates. How can I be sure?

ChIP-Seq next-gen genome • 1.5k views
ADD COMMENT
1
Entering edit mode

What was surprising to me is:

maximum duplicate fragments in treatment = 1
Redundant rate in treatment: 0.99

which are very high. My guess is that you have lots of PCR duplicates in your library.
Therefore, the majority of your fragments are filtered out, I think. But lets wait what others say before jumping to conclusions.

ADD REPLY
0
Entering edit mode

I used picard to mark duplicates to further substantiate my suspicion and confirmed that maximum of the reads (above 90%) were duplicates. My further strategy is to separate the duplicates and unique reads and visualise in IGV which region are these actually.

ADD REPLY
1
Entering edit mode

Samtools flagstat reports individual reads, not read pairs. The fragments in Macs2 output is referring to pairs of reads, so that's why you're getting half the number. The 370,935 difference may be Macs2 filtering read pair alignments that are considered "properly paired" in samtools but not by Macs2. Perhaps the pair is aligned on the same chromosome but at a very large distance from each other. I looked at the Macs2 manual but couldn't find any information about what filtering it might do with paired-end reads.

ADD REPLY

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6