How many reads will be removed after markduplicates in general?
0
0
Entering edit mode
2.3 years ago
zau saa ▴ 150

Hello! Now I'm dealing with bulk DNA-seq data.

Firstly, I aligned it to human genome by bwa-mem.

Before MarkDuplicates by picard with only required options(I, O, M), its mean depth of total region is 34.

However, after Markduplicates, its mean depth of total region is only 18! Moreover, after MarkDuplicates, its bam file is only 1G smaller. Why has the depth descreased so much? The depth is calculated by mosdepth (mosdepth -t 4 -n).

MarkDuplicates • 1.5k views
ADD COMMENT
1
Entering edit mode

Why has the depth descreased so much?

run samtools flagstats to count the number of marked/total reads.

ADD REPLY
1
Entering edit mode

Did you simply mark or actually removed the duplicates? BAM file could have shrunk if reads simply got rearranged.

ADD REPLY
0
Entering edit mode

or the compression level has changed ...

ADD REPLY
0
Entering edit mode

I realize that I only identified the duplicates.

ADD REPLY
0
Entering edit mode

If I forget to remove duplicates, will it influences the SNPs called by gatk HaplotypeCaller?

ADD REPLY
1
Entering edit mode

will it influences the SNPs called by gatk HaplotypeCaller?

No. https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

These Read Filters are automatically applied to the data by the Engine before processing by HaplotypeCaller.

NotDuplicateReadFilter

ADD REPLY
0
Entering edit mode

Thanks for your answer!

ADD REPLY
0
Entering edit mode

Returning to the original question, would you think it's strange that the depth of bulk DNA-seq data decreases too much after markduplicates?

ADD REPLY

Login before adding your answer.

Traffic: 2419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6