Question

unsorting a bam file

0

Entering edit mode

2.6 years ago

wiscoyogi ▴ 40

I have some data with high PCR duplicates that I need to deduplicate with Picard MarkDuplicates. However, the outputted bam from Picard needs to be unsorted for the downstream program using the output of Picard.

I tried running picard with ASSUME_SORT_ORDER unsorted, but got the following error:

  9 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
 10 picard.PicardException: This program requires input that are either coordinate or query sorted (according to the header, or at least ASSUME_SORT_ORDER and the content.) Found ASSUME_SORT_ORDER=unsorted and header sortorder=unsorted

Either (1) is there a way to unsort a bam file from Picard? (2) is there a way to run Picard on an unsorted bam file, which the documentation suggests but this error message implies otherwise?

star picard • 1.4k views

ADD COMMENT • link updated 2.6 years ago by swbarnes2 14k • written 2.6 years ago by wiscoyogi ▴ 40

score 2 · Answer 1 · 2022-05-09

2

Entering edit mode

2.6 years ago

swbarnes2 14k

You don't want to unsort so much as you want to sort by qname instead of by coordinate.

ADD COMMENT • link 2.6 years ago by swbarnes2 14k

0

Entering edit mode

can you elaborate more on this?

ADD REPLY • link 2.6 years ago by wiscoyogi ▴ 40

0

Entering edit mode

as best Picard Tools can tell your BAM file is unsorted, but to mark duplicates your input needs to be a BAM file that is sorted by genomic coordinates, see documentation for details: https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates

For sorting a BAM file see samtools sort http://www.htslib.org/doc/samtools-sort.html

ADD REPLY • link 2.6 years ago by jv ★ 1.8k

0

Entering edit mode

my problem is that the program im feeding the de-duplicated files to requires an unsorted bam file.

is there a way to unsort a bam file?

ADD REPLY • link 2.6 years ago by wiscoyogi ▴ 40

0

Entering edit mode

There is no such thing. A bam can be sorted by mapping coordinates, or it can be sorted by query name. The documentation linked to above says that it can accept both query or coordinate sorted bams as input, though what it does with unmapped reads is different.

ADD REPLY • link 2.6 years ago by swbarnes2 14k