Mark duplicates the bam files sorted by coordinates

0

Entering edit mode

3.2 years ago

priya.bmg ▴ 60

Hello

As it is mentioned in the documentation (https://gatk.broadinstitute.org/hc/en-us/articles/360037224932?page=1#comment_4406762304155), it is ideal to submit the query name based sorted bam files, so will it be computationally intensive process to submit the coordinated based sorted bam files?

First, I sorted the unmapped and mapped bam files by queryname and merged these files and then sorted by coordinates. Can these merged bam files which are sorted by coordinates be used to mark duplicates by spark? Also, subsequently run SetNmMdAndUqTags before running BQSR.Please advice

Thanks

Spark duplicates Mark • 740 views

ADD COMMENT • link updated 3.2 years ago by benformatics 4.0k • written 3.2 years ago by priya.bmg ▴ 60

0

Entering edit mode

From your link:

This can result in the tool being up to 2x slower processing under some circumstances.

Is what it says there... so probably negligible unless you need your results yesterday...

ADD REPLY • link 3.2 years ago by benformatics 4.0k

Login before adding your answer.