I'm making a call to a program that outputs a BAM and I want to filter it on a field.
I use Popen() to call the program and then I want to only keep reads that have a certain value for one of the optional SAM/BAM fields. How can this be done from within samtools? Ideally, I'd like to do it from Pysam but it looks like Pysam does not support streams... e.g.
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
Pysam can read from standard input, using:
s = pysam.Samfile("-", "rb")
But I want to pipe the output in "p" into Pysam so that I can iterate through and get AlignedRead objects, filtering them as they come. Is there a way to do this?
I tried tricking samfile into thinking the output of "my_cmd" came from stdin, as in:
p = subprocess.Popen(my_cmd, stdin=subprocess.PIPE, shell=True).stdout
But this does not work... if I have the above, and try:
s = pysam.Samfile("-", "rb")
for read in s:
print read
then I get binary garbage printed to the screen, so pysam is not parsing it for some reason. How can this be fixed?
If not then pysam will force me to write the output to a file first, which is quite prohibitive given the size of BAM files.
thanks.
If you're getting binary stuff printed to the screen, it sounds like the output is still BAM format. In
my_cmd
, are you passing the -S arg to samtools for SAM output?try p.stdout instead of "-" in your pysam command.
That will throw 'TypeError: Argument must be string or unicode'