You could use the .next()
method of python's iterator to go over two sequences at a time with something like this:
from Bio import SeqIO
handle1 = open("example_1.fq", "rU")
handle2 = open("example_2.fq", "rU")
itter1 = SeqIO.parse(handle1, "fastq")
itter2 = SeqIO.parse(handle2, "fastq")
for record1 in itter1 :
record2 = itter2.next()
#Calculate GC on record 1 and record 2
handle1.close()
handle2.close()
This will work as long as the reads are sorted in your two fastq files. This is pretty typical for what comes off of the illumina machine. Another option if your fastq file of mates is interleaved, you could do something similar but with a single handle:
from Bio import SeqIO
handle = open("example_interleaved.fq", "rU")
fq_itter = SeqIO.parse(handle, "fastq")
for record1 in fq_itter:
record2 = fq_itter.next()
#Calculate GC on record 1 and record 2
handle.close()
Since for bla in bla_itter
just calls .next()
over and over again on bla_itter
you don't get issues with seeing the same record twice, if you call .next
inside of the inner loop it moves the outer loop forward an extra position as well. That might not be technically correct, but that is how I have seen python behave.
Thank you for the code.