Filtering out the spliced reads from bam file.
3
1
Entering edit mode
8.5 years ago
EVR ▴ 610

Hi,

I have bam file and I would like to filter out spliced reads from the bam file. How can I achieve that directly on bam file. Kindly guide me. Thanks in advance

RNA-Seq BAM spliced-reads • 7.0k views
ADD COMMENT
0
Entering edit mode

Write a script (likely with pysam) to do it.

ADD REPLY
1
Entering edit mode

What is pysam? I already tried with following code

 samtools view  scaffoldxxx.bam | awk '!($6 ~ /N/)' > skipped_scaffold_xxx.bam

But somehow the output is not in the bam format. Should I export it to bed format and later convert back to BAM format?

ADD REPLY
0
Entering edit mode

You'll need to include the header and pipe the output of that to samtools, since it's SAM format. I'm sure you can google "pysam".

ADD REPLY
0
Entering edit mode

The output you get is not in bam format for two reasons: 1) samtools view doesn't output the header by default and 2) It outputs in sam, not bam format by default.

ADD REPLY
0
Entering edit mode

You can do the filtering in samtools.

 samtools view -e 'cigar!~"N"' scaffoldxxx.bam  -bo skipped_scaffold_xxx.bam

Or if you want to use awk you need to print out with header

 samtools view -h  scaffoldxxx.bam | awk '/^@/ {print;next} !($6 ~ /N/)' | samtools view -bo  skipped_scaffold_xxx.bam
ADD REPLY
0
Entering edit mode

Also look at this previous thread for some ideas:

ADD REPLY
1
Entering edit mode
2.0 years ago

I needed the inverse, selecting only spliced reads. Using python, save this as select_spliced_reads.py and execute as python only_spliced_reads.py -b input.bam -o spliced.bam

import pysam
from argparse import ArgumentParser


def main():
    args = get_args()
    get_spliced_reads(args)


def get_args():
    parser = ArgumentParser("")
    parser.add_argument("-b", "--bam", help="input bam", required=True)
    parser.add_argument("-o", "--output", help="output bam", required=True)
    parser.add_argument("-i", "--introns", help="minimal number of introns", default=1, type=int)
    return parser.parse_args()


def get_spliced_reads(args):
    bam_file = pysam.AlignmentFile(args.bam, "rb")
    out_file = pysam.AlignmentFile(args.output, "wb", template=bam_file)
    for read in bam_file.fetch():
        if sum(operation == 3 for (operation, _) in read.cigartuples) > args.introns:
            out_file.write(read)
    bam_file.close()
    out_file.close()


if __name__ == "__main__":
    main()
ADD COMMENT
0
Entering edit mode
8.5 years ago

Using picard FilterSamReads http://broadinstitute.github.io/picard/command-line-overview.html#Overview and a javascript filter (not tested)

java -jar picard.jar FilterSamReads \
       I=input.bam \ 
       O=output.bam \
      JAVASCRIPT_FILE=filter.js

with filter.js

function accept(r) {
if(r.getReadUnmappedFlag()) return true;
var i,c= r.getCigar();
if(c==null) return true;
for( i=0;i< c. numCigarElements() ;++i) {
  if(  c. getCigarElement(i).getOperator().name().equals("N") ) return false;
  }
return true;
}

accept(record);
ADD COMMENT
0
Entering edit mode
8.5 years ago

Using the BBMap package:

reformat.sh in=mapped.bam out=filtered.bam maxdellen=50

You can set "maxdellen" to whatever length deletion event you consider the minimum to signify splicing, which depends on the organism.

ADD COMMENT
1
Entering edit mode

Apparently, the maxdellen parameter doesn't work anymore

ADD REPLY
0
Entering edit mode

It is now known as dellenfilter. Very convenient way to get rid of spliced reads.

ADD REPLY

Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6