Filter bam file based on cigar string?
2
0
Entering edit mode
17 months ago
bioinf_sci ▴ 20

How can I filter a bam file using samtools view that retains reads where the cigar string contains M<=35?

samtools bam • 1.4k views
ADD COMMENT
2
Entering edit mode
17 months ago
jkbonfield ★ 1.3k

A bit messy, but it can be done with regexp matching:

samtools view -e 'cigar =~ "(^|[^0-9])([0-9]|[12][0-9]|3[0-5])M"' -o out.bam in.bam

The regexp is start or non-digit, followed by ?M [12]?M or 3[012345]M.

ADD COMMENT
0
Entering edit mode

Hi, if I want to discard reads where the cigar string contains M < 100, how should I write this instead? Thank you~

ADD REPLY
0
Entering edit mode
17 months ago

using samjdk: https://jvarkit.readthedocs.io/en/latest/SamJdk/

I cannot post code here ! there is a bug ni biostars...

so here is a gist:

ADD COMMENT
0
Entering edit mode
java -jar src/jvarkit-git/dist/jvarkit.jar samjdk -e 'return !record.getReadUnmappedFlag() && record.getCigar().getCigarElements().stream().filter(CE->CE.getOperator().equals(CigarOperator.M)).anyMatch(CE->CE.getLength()<=45);' in.bam

Posting code here in case gist goes away.

ADD REPLY

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6