I am using pysam (http://pysam.readthedocs.org/en/latest/api.html#pysam.AlignmentFile.mate) which is an interface to the samtools command line utilities to do analyze sequencing data. I'd like to work with paired read (called mate in pysam). In its tutorial for the command mate it says (http://pysam.readthedocs.org/en/latest/api.html#pysam.AlignmentFile.mate):
This method is too slow for high-throughput processing. If a read needs to be processed with its mate, work from a read name sorted file or, better, cache reads.
How do you 'cache reads'?
Thanks
Thanks for your great description. If I understand you correctly, that means I shouldn't be using the original mate method in pysam. Instead I should create my own mate method that just works with the reads stored in RAM. right?
Exactly :)
Furthermore, I think the pysam.mate function is more on looking forward in the file, i.e., given the first-read it searches for the second-read. In contrast, the described caching is looking backwards, i.e., when you encounter the second-read you need to find the first-read.