How To Use Extract Record In Fastq Format With Random Number Of Sequence ?
1
0
Entering edit mode
12.7 years ago
tri • 0

hi

i try to write a small program to extract records from fassing tq format (appro. 14 million records ) and create new small fastq format file (1.4 million records). first, i create list of random numbers, then i try to scan through original data file with counter count, whenever count is in list of random number, this record will be put in target file. the problem is how to write this program using expression generator (since memory could not load all output result before write into file)

from Bio import SeqIO
count =0
rd =[56,12,5,6,3]  <-  using function to generate this list

def inc() :
  global count
  count +=1

input_seq_iterator = SeqIO.parse(open(("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger")
short_seq_iterator = (record for record in input_seq_iterator 
                      if count in rd and inc() )

output_handle = open(C:\\Python26\\Doc\\selected.fasta", "w")
SeqIO.write(short_seq_iterator, output_handle, "fasta")
output_handle.close()

this program could not generate at all

thanks

expression • 2.3k views
ADD COMMENT
0
Entering edit mode
12.7 years ago

You can do this:

from Bio import SeqIO

count = 0
for record in SeqIO.parse(open("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger"):
  if count in rd:
    print ">" + record.id
    print str(record.seq)
  count += 1

It might be faster to use a dictionary for the random numbers instead of an array:

from Bio import SeqIO
rd = dict([(x,True) for x in rd])

count = 0
for record in SeqIO.parse(open("C:\\Python\\Doc\\ls_orchid.fastq-sanger", "rU"), "fastq-sanger"):
  if rd.has_key(count):
    print ">" + record.id
    print str(record.seq)
  count += 1
ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6