Hi,
I have bam file and I would like to filter out spliced reads from the bam file. How can I achieve that directly on bam file. Kindly guide me. Thanks in advance
Hi,
I have bam file and I would like to filter out spliced reads from the bam file. How can I achieve that directly on bam file. Kindly guide me. Thanks in advance
I needed the inverse, selecting only spliced reads. Using python, save this as select_spliced_reads.py
and execute as
python only_spliced_reads.py -b input.bam -o spliced.bam
import pysam
from argparse import ArgumentParser
def main():
args = get_args()
get_spliced_reads(args)
def get_args():
parser = ArgumentParser("")
parser.add_argument("-b", "--bam", help="input bam", required=True)
parser.add_argument("-o", "--output", help="output bam", required=True)
parser.add_argument("-i", "--introns", help="minimal number of introns", default=1, type=int)
return parser.parse_args()
def get_spliced_reads(args):
bam_file = pysam.AlignmentFile(args.bam, "rb")
out_file = pysam.AlignmentFile(args.output, "wb", template=bam_file)
for read in bam_file.fetch():
if sum(operation == 3 for (operation, _) in read.cigartuples) > args.introns:
out_file.write(read)
bam_file.close()
out_file.close()
if __name__ == "__main__":
main()
Using picard FilterSamReads http://broadinstitute.github.io/picard/command-line-overview.html#Overview and a javascript filter (not tested)
java -jar picard.jar FilterSamReads \
I=input.bam \
O=output.bam \
JAVASCRIPT_FILE=filter.js
with filter.js
function accept(r) {
if(r.getReadUnmappedFlag()) return true;
var i,c= r.getCigar();
if(c==null) return true;
for( i=0;i< c. numCigarElements() ;++i) {
if( c. getCigarElement(i).getOperator().name().equals("N") ) return false;
}
return true;
}
accept(record);
Using the BBMap package:
reformat.sh in=mapped.bam out=filtered.bam maxdellen=50
You can set "maxdellen" to whatever length deletion event you consider the minimum to signify splicing, which depends on the organism.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Write a script (likely with pysam) to do it.
What is pysam? I already tried with following code
But somehow the output is not in the bam format. Should I export it to bed format and later convert back to BAM format?
You'll need to include the header and pipe the output of that to samtools, since it's SAM format. I'm sure you can google "pysam".
The output you get is not in bam format for two reasons: 1) samtools view doesn't output the header by default and 2) It outputs in sam, not bam format by default.
You can do the filtering in samtools.
Or if you want to use awk you need to print out with header
Also look at this previous thread for some ideas: