MarkDuplicates output file in GATK pipeline
1
1
Entering edit mode
5.5 years ago
gprashant17 ▴ 110

I have used GATK's MarkDuplicates on a BAM file I obtained after alignment, which resulted in another file marked_duplicates.bam. So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF), or this is just a file containing duplicates? In the latter case, is it possible to obtain a BAM file, with all the duplicates removed?

gatk rna-seq alignment bam sequencing • 1.0k views
ADD COMMENT
1
Entering edit mode
5.5 years ago

So should I proceed with this marked_duplicates.bam file for analysis (converting to VCF),

yes. As a proof, test both files wth samtools flagstats

is it possible to obtain a BAM file, with all the duplicates removed?

in the manual : https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.4.0/picard_sam_markduplicates_MarkDuplicates.php#--REMOVE_DUPLICATES

--REMOVE_DUPLICATES / NA

If true do not write duplicates to the output file instead of writing them with appropriate flags set.
ADD COMMENT
0
Entering edit mode

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

ADD REPLY
0
Entering edit mode

So if I did not use --REMOVE_DUPLICATES, the duplicate reads will still be present in the marked_duplicates.bam but they would have been flagged as duplicates right?

yes

ADD REPLY

Login before adding your answer.

Traffic: 1659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6