Entering edit mode
2.5 years ago
wiscoyogi
▴
40
I have some data with high PCR duplicates that I need to deduplicate with Picard MarkDuplicates. However, the outputted bam from Picard needs to be unsorted for the downstream program using the output of Picard.
I tried running picard with ASSUME_SORT_ORDER unsorted, but got the following error:
9 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
10 picard.PicardException: This program requires input that are either coordinate or query sorted (according to the header, or at least ASSUME_SORT_ORDER and the content.) Found ASSUME_SORT_ORDER=unsorted and header sortorder=unsorted
Either (1) is there a way to unsort a bam file from Picard? (2) is there a way to run Picard on an unsorted bam file, which the documentation suggests but this error message implies otherwise?
can you elaborate more on this?
as best Picard Tools can tell your BAM file is unsorted, but to mark duplicates your input needs to be a BAM file that is sorted by genomic coordinates, see documentation for details: https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates
For sorting a BAM file see
samtools sort
http://www.htslib.org/doc/samtools-sort.htmlmy problem is that the program im feeding the de-duplicated files to requires an unsorted bam file.
is there a way to unsort a bam file?
There is no such thing. A bam can be sorted by mapping coordinates, or it can be sorted by query name. The documentation linked to above says that it can accept both query or coordinate sorted bams as input, though what it does with unmapped reads is different.