Filtering sam/bam by using CIGAR deletion sites
3
0
Entering edit mode
5.8 years ago
ag1194 • 0

Hi,

I have a CLIP-Seq data and after mapping, I want to filter my sam files for reads containing deletion by using CIGAR & MAPQ >10. Is there a way in samtools to subset my sam file containing reads MAPQ>10 & with deletion?

Thank you very much!

alignment next-gen • 4.8k views
ADD COMMENT
0
Entering edit mode

Thank you very much!!! I'll try and see samjdk first and also pysam to see the results.

ADD REPLY
3
Entering edit mode
5.8 years ago
Geparada ★ 1.5k

I would do something like:

samtools view file.bam | awk '($6 ~ /D/) && ($5>10)'

($6 ~ /D/) means "deletion in CIGAR" and $5>10 means MAPQ>10

I really like awk oneliners for doing quick filters like this.

ADD COMMENT
0
Entering edit mode

With samtools view & awk tools, I cannot keep the header in my output bam. Is there a way to have filtered bam with header with awk tool? Thanks!!!

ADD REPLY
1
Entering edit mode
samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))'
ADD REPLY
0
Entering edit mode

Hey when I try to pipe this to a new bam like so

samtools view -h file.bam | awk '($0 ~ /^@/) || (($6 ~ /D/) && ($5>10))' > new.bam

I can't view the new.bam. Any ideas?

ADD REPLY
0
Entering edit mode

The result will be sam, not bam. To save it as sam, you need to pipe it to samtools view -b

ADD REPLY
0
Entering edit mode
5.8 years ago

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'return !record.getReadUnmappedFlag() && record.getMappingQuality()>10 && record.getCigar().getCigarElements().stream().map(C->C.getOperator()).anyMatch(O->O.equals(CigarOperator.D) || O.equals(CigarOperator.N)); ' input.bam
ADD COMMENT
0
Entering edit mode
5.8 years ago
Vitis ★ 2.6k

Pysam is a good tool to manipulate SAM/BAM with various attributes like CIGAR and mapping quality.

https://pysam.readthedocs.io/en/latest/

ADD COMMENT
0
Entering edit mode
import pysam
inbam = pysam.AlignmentFile("my_bamfile.bam")
outbam = pysam.AlignmentFile("filtered_bamfile.bam", template=inbam)
for read in inbam.fetch(until_eof=True):
    if read.mapping_quality > 30 and 'D' in read.cigarstring:
        outbam.write(read)
ADD REPLY

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6