How to know if samtools sort exited normally [left temporary files behind]
4
0
Entering edit mode
9.2 years ago

GoodDay Guys!

It seldom happens to me that samtools sort leaves the temporary files behind.

samtools sort : This command will also create temporary filesout.prefix.%d.bam as needed when the entire alignment data cannot fit into memory (as controlled via the -m option)

Ex. file.001.bam file.002.bam .......... file.00X.bam

A normal course is these split up files are merged later on to a single BAM file and deleted. I ran samtools sort on a sam file (22G) and it generated an output of 2.9G

samtools view -bSh file.sam | samtools sort - file.sam.sorted

But, now I am not sure if the samtools sort finished normally or not (because of these files left behind). How can I confirm that, rather than just running it again.

I think it's not right to just concatenate all the temporary files, (which I did) outputted a BAM file of 6.7G. There is the discrepancy.

Thanks

ngs ChIP-Seq next-gen samtools • 5.5k views
ADD COMMENT
3
Entering edit mode
9.2 years ago

There have been a few errors related to samtools sort not propagating an error return value properly. One of these was fixed in PR #467, but I vaguely recall others.

In general, you would need to echo "${PIPESTATUS[0]} ${PIPESTATUS[1]}" to check the return status, but that assumes that it's returning a proper return value. The other method would be to "samtools view -c foo.bam" and see if the counts of entries jive. If the merged file is truncated then you can be certain that the temp files are not and can "samtools merge" them.

ADD COMMENT
0
Entering edit mode

Hi Devon,
PIPESTATUS would not help in this case, as I ran the command on a computenode on cluster. Now, I can't enter same node and run as the PIPESTATUS as it would have be over-written with other fresh pipeouts. But its a good thing to keep in mind.

The merged file is truncated and counting fails, so I am merging them.
Thanks

ADD REPLY
0
Entering edit mode

I didn't know the PIPESTATUS variable, thanks for mentioning it! It's been bugging me that echo "$?" returns the exit code of the last command in the pipe!

ADD REPLY
2
Entering edit mode
9.2 years ago

I'd say samtools sort didn't complete, since temp files are not deleted. Try to index the sorted file, if the sorted file is not complete samtools index will complain. You could also count the number of reads in the sorted file and check that this count matches the read count in the input sam.

ADD COMMENT
0
Entering edit mode

Exactly, thanks!

ADD REPLY
1
Entering edit mode
9.2 years ago
Alternative ▴ 290

Does the final file contains the same read count as the initial non-sorted file ? You can check that (samtools view in.bam | wc -l or samtools idxstats if the file is indexed). Both files (non-sorted and sorted) should have the same size.

Also, I would check if the header of the sorted file indicates (Should be like @HD VN:1.3 SO:coordinate)

ADD COMMENT
0
Entering edit mode

Header is present, with a warning, that makes more sense. Thanks!

samtools view -h file.sorted.bam | head
[bam_header_read] EOF marker is absent. The input is probably truncated.
@HD    VN:1.0    SO:coordinate
@SQ    SN:chr1    LN:195471971
@SQ    SN:chr2    LN:182113224
@SQ    SN:chr3    LN:160039680
@SQ    SN:chr4    LN:156508116
@SQ    SN:chr5    LN:151834684
@SQ    SN:chr6    LN:149736546
@SQ    SN:chr7    LN:145441459
@SQ    SN:chr8    LN:129401213
@SQ    SN:chr9    LN:124595110
ADD REPLY
1
Entering edit mode
9.2 years ago

You could try samtools idxstats bamfile.bam to count the number of reads in the bam file :

echo $(($(samtools idxstats bafile.bam | cut -f 3| tr "\n" "+" | xargs -I{} echo {} 0) ))

Then
wc -l samfile.sam
to count the number of lines in the sam file which is the number of reads mapped + the header.

PS : from my experience, samtools always merge its intermediary files. Isn't there an error report somewhere in your terminal log ?

ADD COMMENT
0
Entering edit mode

No error in log, the file is incomplete (truncated). I am merging again.
Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6