Should sorted bam files when processed have ordered chromossomes?
1
1
Entering edit mode
16 months ago

Hello,

I have a sorted.bam resulting of targeted sequencing of a set of genes sent to me. When I run Picard AddOrReplaceGroups or Picard MarkDuplicates, I noticed that the order of the chromossomes was not sorted.

INFO    2023-07-24 10:39:16     MarkDuplicates  Read    79,000,000 records.  Elapsed time: 00:11:03s.  
Time for last 1,000,000:    3s.  Last read position: chr19:15,379,656
INFO    2023-07-24 10:39:16     MarkDuplicates  Tracking 31672 as yet unmatched pairs. 6635 records in RAM.
INFO    2023-07-24 10:39:18     MarkDuplicates  Read    77,000,000 records.  Elapsed time: 00:11:05s.  Time for last 1,000,000:    3s.  Last read position: chr18:60,891,077
INFO    2023-07-24 10:39:18     MarkDuplicates  Tracking 36160 as yet unmatched pairs. 2514 records in RAM.
INFO    2023-07-24 10:39:19     MarkDuplicates  Read    76,000,000 records.  Elapsed time: 00:11:06s.  Time for last 1,000,000:    3s.  Last read position: chr17:81,080,684
INFO    2023-07-24 10:39:19     MarkDuplicates  Tracking 37472 as yet unmatched pairs. 315 records in RAM.
INFO    2023-07-24 10:39:19     MarkDuplicates  Read    73,000,000 records.  Elapsed time: 00:11:06s.  Time for last 1,000,000:    5s.  Last read position: chr17:39,883,273
INFO    2023-07-24 10:39:19     MarkDuplicates  Tracking 41912 as yet unmatched pairs. 5780 records in RAM.
INFO    2023-07-24 10:39:23     MarkDuplicates  Read    74,000,000 records.  Elapsed time: 00:11:10s.  Time for last 1,000,000:    3s.  Last read position: chr17:41,310,705
INFO    2023-07-24 10:39:23     MarkDuplicates  Tracking 41672 as yet unmatched pairs. 5390 records in RAM.
INFO    2023-07-24 10:39:26     MarkDuplicates  Read    77,000,000 records.  Elapsed time: 00:11:13s.  Time for last 1,000,000:    6s.  Last read position: chr18:60,891,077
INFO    2023-07-24 10:39:26     MarkDuplicates  Tracking 36160 as yet unmatched pairs. 2514 records in RAM.
INFO    2023-07-24 10:39:27     MarkDuplicates  Read    78,000,000 records.  Elapsed time: 00:11:14s.  Time for last 1,000,000:    8s.  Last read position: chr19:10,065,639
INFO    2023-07-24 10:39:27     MarkDuplicates  Tracking 32057 as yet unmatched pairs. 7191 records in RAM.
INFO    2023-07-24 10:39:27     MarkDuplicates  Read    80,000,000 records.  Elapsed time: 00:11:14s.  Time for

As you can see it goes from reading 19 than to 18, 17 and then again back to 19. This is a small part of the output it happen more times. Is this expected?

Thank you!

BAM GATK MarkDuplicates • 719 views
ADD COMMENT
2
Entering edit mode

If you are using multiple threads then the output may be buffered in a temporary location before finally being written out in the correct order. Don't worry if the file is not sorted then programs will complain.

ADD REPLY
0
Entering edit mode

you are right! thank you :)

ADD REPLY
1
Entering edit mode
16 months ago
ATpoint 85k

If you run a bam through something like samtools sort and it finishes without error it's fine. Logs of tools can be confusing and it depends how they iterate over files. It will print an error of something is wrong, don't worry.

ADD COMMENT
0
Entering edit mode

thank you! The data source assured me the file was sorted. Are you suggesting I should sort the bam file again or just ignore this?

ADD REPLY
1
Entering edit mode

Use samtools view -H your.bam | head -n 1, that will print @HD VN:1.0 SO:coordinate if the file is sorted, else it prints unsorted. `

ADD REPLY

Login before adding your answer.

Traffic: 1071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6