Entering edit mode
5.6 years ago
bioguy24
▴
230
I am new to python and trying to learn. The below is an attempt to filter out secondary reads in a all bam files in a directory using pysam. I ran mark duplicates and got an error on several bam files due to secondary alignments. I added comments to each line to illustrate my thought process. I think that when I algin using BWA-MEM with the -M option and there are readpairs with the same valid primary alignments and by removing the lower score alignments I will be ok. Is that correct or is there a better approach? Thank you :)
The specific mark duplicates error:
Value was put into PairInfoMap more than once
Attempt 1
#! /usr/bin/python ## call python script
import sys ## import python system functions
import pysam ## import module
bam = pysam.AlignmentFile(.bam, "rb") ## open bam and read
click.echo("Reading BAM file") ## output message
for read in bam.fetch():
if read.is_secondary=true ## not the primary alignment
continue
if read.has_tag('AS') and read.has_tag('XS') ## look for Primary and Secondary Alignment Score tag
AS = read.get_tag('AS') ## read and store AS value
XS = read.get_tag('XS') ## read and store XS value
if AS >= XS ## Alignment score greater then or equal to XS (these will be printed)
bam.write(read) ## open bam and only print primary alignments
continue ## process next line
bam.close() ## end processing
Attempt 2
#! /usr/bin/python ## call python script
import sys ## import python system functions
import pysam ## import module
bam = pysam.AlignmentFile(bam, "rb") ## open bam and read
click.echo("Reading BAM file") ## output message
for read in bam.fetch(): ## start loop and iterate over each bam
if read.is_secondary: ## not the primary alignment
continue
if read.has_tag('AS') and read.has_tag('XS') ## look for Primary and Secondary Alignment Score tag
AS = read.get_tag('AS') ## read and store AS value
XS = read.get_tag('XS') ## read and store XS value
if AS >= XS ## Alignment score greater then or equal to XS (these will be printed)
read.write(read) ## only print primary alignments
continue
bam.close() ## end processing
I have re-mapped and markduplicates seems ok but I am curious if the python script is close? Thank you :).