Fast way to use Samtools to sort a BAM file by queryname similar to `picard SortSam SORT_ORDER=queryname`?
1
0
Entering edit mode
2.4 years ago
kalavattam ▴ 280

When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other hand, when sorting by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield, instead treating the queryname as one field and then sorting in ASCII sort order (for example, as described in this comment and its sub-comments).

I would like to sort my bam files in the picard SortSam SORT_ORDER=queryname manner, but Picard SortSam is quite a bit slower than samtools sort -n; samtools sort -n can be parallelized while picard SamSort SORT_ORDER=queryname cannot be parallelized. Is there a fast alternative to picard SamSort SORT_ORDER=queryname for this task?

samtools bam picard • 1.1k views
ADD COMMENT
2
Entering edit mode
2.4 years ago

I don't think there a software doing this "fast". You could fork samtools and change the function that compare the name of the reads here:

https://github.com/samtools/samtools/blob/develop/bam_sort.c#L1796

    if (g_is_by_qname) {
        int t = strnum_cmp(bam_get_qname(a.bam_record), bam_get_qname(b.bam_record));
        if (t != 0) return t;
        return (int) (a.bam_record->core.flag&0xc0) - (int) (b.bam_record->core.flag&0xc0);

strnum_cmp is implemented here https://github.com/samtools/samtools/blob/401e254877f3d57660fb848e27c23f4439297da8/bam_sort.c#L107

ADD COMMENT

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6