Is Bio.SeqIO.write() sequential for fastq format?
1
1
Entering edit mode
7.1 years ago
pyKey ▴ 70

Hello everyone,

Here I try to run this python code for reading fastq records and their associated barcode from a singlae pandas dataframe and writing them simultaneously in a fastq and a text file, to preserve the order between associations. Somehow the order is lost once both files are filled and re-opened. I have no clue why, my only guess is that SeqIO.write is not writing the records one by one (ok there is also the second guess which my code has a bug ;) ) Any ideas? I also post the related methods.

Thank you very much,

'''Prepare fastq records.'''
def to_IOSeq_rec(row):
    qname = row['NAME']
    seq = row['SEQ']
    qual = row['QUAL']
    xm = row['XM']
    record = SeqRecord(Seq(seq, generic_dna), id=qname, name=qname, description='', dbxrefs=[])
    record.letter_annotations["phred_quality"] = qual
return record, xm

''' Write the separated data in two files.'''
  with open(out_file, 'w') as fq:
    with open(out_umi, 'w') as fu:
        for index, row in df.iterrows():

            record , xm = to_IOSeq_rec(row)

            Bio.SeqIO.write(record, fq, 'fastq')
            fu.write(xm + '\n')
RNA-Seq Bio.SeqIO Python fastq biopython • 3.8k views
ADD COMMENT
1
Entering edit mode

As far as I'm aware the SeqIO.write() call will write things out as it's fed them from your loop - no other reordering should be occurring so I'd say your problem is elsewhere.

ADD REPLY
0
Entering edit mode

Looks like your code is incomplete. Where is panda dataframe object defined?

ADD REPLY
0
Entering edit mode
7.1 years ago
pyKey ▴ 70

You are right, the order is preserved. I double-checked everything and found my code's bug : I was addressing a wrong dataframe somewhere along the way!

Thank you for the answers,

ADD COMMENT

Login before adding your answer.

Traffic: 2697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6