hi
i try to write a small program to extract records from fassing tq format (appro. 14 million records ) and create new small fastq format file (1.4 million records). first, i create list of random numbers, then i try to scan through original data file with counter count, whenever count is in list of random number, this record will be put in target file. the problem is how to write this program using expression generator (since memory could not load all output result before write into file)
from Bio import SeqIO
count =0
rd =[56,12,5,6,3] <- using function to generate this list
def inc() :
global count
count +=1
input_seq_iterator = SeqIO.parse(open(("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger")
short_seq_iterator = (record for record in input_seq_iterator
if count in rd and inc() )
output_handle = open(C:\\Python26\\Doc\\selected.fasta", "w")
SeqIO.write(short_seq_iterator, output_handle, "fasta")
output_handle.close()
this program could not generate at all
thanks