You could use the .next() method of python's iterator to go over two sequences at a time with something like this:
from Bio import SeqIO
handle1 = open("example_1.fq", "rU")
handle2 = open("example_2.fq", "rU")
itter1 = SeqIO.parse(handle1, "fastq")
itter2 = SeqIO.parse(handle2, "fastq")for record1 in itter1 :
record2 = itter2.next()#Calculate GC on record 1 and record 2
handle1.close()
handle2.close()
This will work as long as the reads are sorted in your two fastq files. This is pretty typical for what comes off of the illumina machine. Another option if your fastq file of mates is interleaved, you could do something similar but with a single handle:
from Bio import SeqIO
handle = open("example_interleaved.fq", "rU")
fq_itter = SeqIO.parse(handle, "fastq")for record1 in fq_itter:
record2 = fq_itter.next()#Calculate GC on record 1 and record 2
handle.close()
Since for bla in bla_itter just calls .next() over and over again on bla_itter you don't get issues with seeing the same record twice, if you call .next inside of the inner loop it moves the outer loop forward an extra position as well. That might not be technically correct, but that is how I have seen python behave.
Depends on what you want. If you want to compute the GC content of each read, then take into account both reads. This is what quality control software like FastQC does.
If you want to compute the GC content of the fragment defined by the paired reads, then use that sequence instead.
As far as I know FastQC takes each read independently. I see no reason for concatenating the two reads one after the other. If you have mate pairs, from paired-end sequencing assay then, they represent the two extremes of an original fragment. See http://www.illumina.com/technology/paired_end_sequencing_assay.ilmn
Can you explain why do you want to compute GC content?
Thank you for the code.