Question

Filtering out the spliced reads from bam file.

1

Entering edit mode

8.4 years ago

EVR ▴ 610

Hi,

I have bam file and I would like to filter out spliced reads from the bam file. How can I achieve that directly on bam file. Kindly guide me. Thanks in advance

RNA-Seq BAM spliced-reads • 6.9k views

ADD COMMENT • link updated 22 months ago by barslmn ★ 2.3k • written 8.4 years ago by EVR ▴ 610

0

Entering edit mode

Write a script (likely with pysam) to do it.

ADD REPLY • link 8.4 years ago by Devon Ryan 104k

1

Entering edit mode

What is pysam? I already tried with following code

 samtools view  scaffoldxxx.bam | awk '!($6 ~ /N/)' > skipped_scaffold_xxx.bam

But somehow the output is not in the bam format. Should I export it to bed format and later convert back to BAM format?

ADD REPLY • link 8.4 years ago by EVR ▴ 610

0

Entering edit mode

You'll need to include the header and pipe the output of that to samtools, since it's SAM format. I'm sure you can google "pysam".

ADD REPLY • link 8.4 years ago by Devon Ryan 104k

0

Entering edit mode

The output you get is not in bam format for two reasons: 1) samtools view doesn't output the header by default and 2) It outputs in sam, not bam format by default.

ADD REPLY • link 8.4 years ago by dariober 15k

0

Entering edit mode

You can do the filtering in samtools.

 samtools view -e 'cigar!~"N"' scaffoldxxx.bam  -bo skipped_scaffold_xxx.bam

Or if you want to use awk you need to print out with header

 samtools view -h  scaffoldxxx.bam | awk '/^@/ {print;next} !($6 ~ /N/)' | samtools view -bo  skipped_scaffold_xxx.bam

ADD REPLY • link 22 months ago by barslmn ★ 2.3k

0

Entering edit mode

Also look at this previous thread for some ideas:

ADD REPLY • link 8.4 years ago by Alastair Kerr 5.3k

score 1 · Answer 1 · 2022-12-14

I needed the inverse, selecting only spliced reads. Using python, save this as select_spliced_reads.py and execute as python only_spliced_reads.py -b input.bam -o spliced.bam

import pysam
from argparse import ArgumentParser


def main():
    args = get_args()
    get_spliced_reads(args)


def get_args():
    parser = ArgumentParser("")
    parser.add_argument("-b", "--bam", help="input bam", required=True)
    parser.add_argument("-o", "--output", help="output bam", required=True)
    parser.add_argument("-i", "--introns", help="minimal number of introns", default=1, type=int)
    return parser.parse_args()


def get_spliced_reads(args):
    bam_file = pysam.AlignmentFile(args.bam, "rb")
    out_file = pysam.AlignmentFile(args.output, "wb", template=bam_file)
    for read in bam_file.fetch():
        if sum(operation == 3 for (operation, _) in read.cigartuples) > args.introns:
            out_file.write(read)
    bam_file.close()
    out_file.close()


if __name__ == "__main__":
    main()

score 0 · Answer 2 · 2016-06-21

Using picard FilterSamReads http://broadinstitute.github.io/picard/command-line-overview.html#Overview and a javascript filter (not tested)

java -jar picard.jar FilterSamReads \
       I=input.bam \ 
       O=output.bam \
      JAVASCRIPT_FILE=filter.js

with filter.js

function accept(r) {
if(r.getReadUnmappedFlag()) return true;
var i,c= r.getCigar();
if(c==null) return true;
for( i=0;i< c. numCigarElements() ;++i) {
  if(  c. getCigarElement(i).getOperator().name().equals("N") ) return false;
  }
return true;
}

accept(record);

score 0 · Answer 3 · 2016-06-21

0

Entering edit mode

8.4 years ago

Brian Bushnell 20k

Using the BBMap package:

reformat.sh in=mapped.bam out=filtered.bam maxdellen=50

You can set "maxdellen" to whatever length deletion event you consider the minimum to signify splicing, which depends on the organism.

ADD COMMENT • link 8.4 years ago by Brian Bushnell 20k

1

Entering edit mode

Apparently, the maxdellen parameter doesn't work anymore

ADD REPLY • link 6.3 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

It is now known as dellenfilter. Very convenient way to get rid of spliced reads.

ADD REPLY • link 3.5 years ago by predeus ★ 2.0k