Entering edit mode
5 months ago
marco.barr
▴
150
Hi everyone, I’m trying to filter reads based on their mapped length, for example from 1000 bp onwards. Could you suggest something? Would these commands work?
samtools view -h /home/bar/input.bam | awk 'BEGIN {OFS="\t"} !/^@/ {len=0; n=split($6, CIGAR, /[MIDNSHP]/); for(i=1; i<n; i++) {if(index($6, CIGAR[i] "M")) {len+=CIGAR[i]}}} {if(len>=1000 || /^@/) print $0}' | samtools view -b > /home/bar/output.bam
Thank you very much for your support.
Can you, in plain english, say what the filtering should do, and on which fields of the SAM/BAM you want to apply it.
I think they're trying to apply it to the CIGAR string (counting the M's, representing the alignment match).
In theory, that should work but they should verify the command on a few BAM records (I don't have time to debug the awk code).
I guess they're dealing with long reads because they're talking about >1000bp alignments. Also, not sure why one would want to filter by mapped length -- that's typically the job of the alignment tool.