Hello everyone,
Here I try to run this python code for reading fastq records and their associated barcode from a singlae pandas dataframe and writing them simultaneously in a fastq and a text file, to preserve the order between associations. Somehow the order is lost once both files are filled and re-opened. I have no clue why, my only guess is that SeqIO.write is not writing the records one by one (ok there is also the second guess which my code has a bug ;) ) Any ideas? I also post the related methods.
Thank you very much,
'''Prepare fastq records.'''
def to_IOSeq_rec(row):
qname = row['NAME']
seq = row['SEQ']
qual = row['QUAL']
xm = row['XM']
record = SeqRecord(Seq(seq, generic_dna), id=qname, name=qname, description='', dbxrefs=[])
record.letter_annotations["phred_quality"] = qual
return record, xm
''' Write the separated data in two files.'''
with open(out_file, 'w') as fq:
with open(out_umi, 'w') as fu:
for index, row in df.iterrows():
record , xm = to_IOSeq_rec(row)
Bio.SeqIO.write(record, fq, 'fastq')
fu.write(xm + '\n')
As far as I'm aware the
SeqIO.write()
call will write things out as it's fed them from your loop - no other reordering should be occurring so I'd say your problem is elsewhere.Looks like your code is incomplete. Where is panda dataframe object defined?