filter reads based on mapped length
1
0
Entering edit mode
6 months ago
marco.barr ▴ 160

Hi everyone, I’m trying to filter reads based on their mapped length, for example from 1000 bp onwards. Could you suggest something? Would these commands work?

samtools view -h /home/bar/input.bam | awk 'BEGIN {OFS="\t"} !/^@/ {len=0; n=split($6, CIGAR, /[MIDNSHP]/); for(i=1; i<n; i++) {if(index($6, CIGAR[i] "M")) {len+=CIGAR[i]}}} {if(len>=1000 || /^@/) print $0}' | samtools view -b > /home/bar/output.bam

Thank you very much for your support.

samtools reads BAM • 708 views
ADD COMMENT
0
Entering edit mode

Can you, in plain english, say what the filtering should do, and on which fields of the SAM/BAM you want to apply it.

ADD REPLY
0
Entering edit mode

I think they're trying to apply it to the CIGAR string (counting the M's, representing the alignment match).

In theory, that should work but they should verify the command on a few BAM records (I don't have time to debug the awk code).

I guess they're dealing with long reads because they're talking about >1000bp alignments. Also, not sure why one would want to filter by mapped length -- that's typically the job of the alignment tool.

ADD REPLY
2
Entering edit mode
6 months ago
jkbonfield ★ 1.3k

Use the filtering query language. samtools view -e 'rlen >= 1000' input.bam should do it.

Although that's the number of reference bases aligned against. It's not quite the same as the number of query bases, but maybe close enough. The difference is basically counting "D" cigars vs "I" cigars.

ADD COMMENT

Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6