Entering edit mode
2.4 years ago
zau saa
▴
150
Hello! Now I'm dealing with bulk DNA-seq data.
Firstly, I aligned it to human genome by bwa-mem.
Before MarkDuplicates by picard with only required options(I, O, M), its mean depth of total region is 34.
However, after Markduplicates, its mean depth of total region is only 18! Moreover, after MarkDuplicates, its bam file is only 1G smaller. Why has the depth descreased so much? The depth is calculated by mosdepth (mosdepth -t 4 -n).
run
samtools flagstats
to count the number of marked/total reads.Did you simply mark or actually removed the duplicates? BAM file could have shrunk if reads simply got rearranged.
or the compression level has changed ...
I realize that I only identified the duplicates.
If I forget to remove duplicates, will it influences the SNPs called by gatk HaplotypeCaller?
No. https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller
Thanks for your answer!
Returning to the original question, would you think it's strange that the depth of bulk DNA-seq data decreases too much after markduplicates?