Entering edit mode
3.4 years ago
MarVi
▴
30
Dear all,
I am trying to retrieve all reads in several different regions for several bam files. I am writing my code in python, using the pysam library as follows,
pysam.view("-F0X100", bam_file, region_to_query)
Everything works, but I am concerned about the time it takes to process each bam file. It seems that the I/O takes a long time since I am querying each region at a time.
Would you have any suggestions on how I can speed up the querying of each bam file?
Have you tried "bedtools intersect"? I guess you can easily intersect each bam file with the bed file (regions) in parallel by bedtools, then merge the output bam files finally.