Hello,
I have a sorted.bam resulting of targeted sequencing of a set of genes sent to me. When I run Picard AddOrReplaceGroups or Picard MarkDuplicates, I noticed that the order of the chromossomes was not sorted.
INFO 2023-07-24 10:39:16 MarkDuplicates Read 79,000,000 records. Elapsed time: 00:11:03s.
Time for last 1,000,000: 3s. Last read position: chr19:15,379,656
INFO 2023-07-24 10:39:16 MarkDuplicates Tracking 31672 as yet unmatched pairs. 6635 records in RAM.
INFO 2023-07-24 10:39:18 MarkDuplicates Read 77,000,000 records. Elapsed time: 00:11:05s. Time for last 1,000,000: 3s. Last read position: chr18:60,891,077
INFO 2023-07-24 10:39:18 MarkDuplicates Tracking 36160 as yet unmatched pairs. 2514 records in RAM.
INFO 2023-07-24 10:39:19 MarkDuplicates Read 76,000,000 records. Elapsed time: 00:11:06s. Time for last 1,000,000: 3s. Last read position: chr17:81,080,684
INFO 2023-07-24 10:39:19 MarkDuplicates Tracking 37472 as yet unmatched pairs. 315 records in RAM.
INFO 2023-07-24 10:39:19 MarkDuplicates Read 73,000,000 records. Elapsed time: 00:11:06s. Time for last 1,000,000: 5s. Last read position: chr17:39,883,273
INFO 2023-07-24 10:39:19 MarkDuplicates Tracking 41912 as yet unmatched pairs. 5780 records in RAM.
INFO 2023-07-24 10:39:23 MarkDuplicates Read 74,000,000 records. Elapsed time: 00:11:10s. Time for last 1,000,000: 3s. Last read position: chr17:41,310,705
INFO 2023-07-24 10:39:23 MarkDuplicates Tracking 41672 as yet unmatched pairs. 5390 records in RAM.
INFO 2023-07-24 10:39:26 MarkDuplicates Read 77,000,000 records. Elapsed time: 00:11:13s. Time for last 1,000,000: 6s. Last read position: chr18:60,891,077
INFO 2023-07-24 10:39:26 MarkDuplicates Tracking 36160 as yet unmatched pairs. 2514 records in RAM.
INFO 2023-07-24 10:39:27 MarkDuplicates Read 78,000,000 records. Elapsed time: 00:11:14s. Time for last 1,000,000: 8s. Last read position: chr19:10,065,639
INFO 2023-07-24 10:39:27 MarkDuplicates Tracking 32057 as yet unmatched pairs. 7191 records in RAM.
INFO 2023-07-24 10:39:27 MarkDuplicates Read 80,000,000 records. Elapsed time: 00:11:14s. Time for
As you can see it goes from reading 19 than to 18, 17 and then again back to 19. This is a small part of the output it happen more times. Is this expected?
Thank you!
If you are using multiple threads then the output may be buffered in a temporary location before finally being written out in the correct order. Don't worry if the file is not sorted then programs will complain.
you are right! thank you :)