Problem with Bio.SeqIO
0
0
Entering edit mode
3.9 years ago
haasroni • 0

Hi! I am having an unexplained problem with the code I wrote, using SeqIO of Biopython. I am doing several filtering steps for a fastq file using this code:

def extract_from_fastq(fq, output_fq):
    """
    Takes a fastq file, examines each read using all the above functions, and writes to a 
    new file the non-ambiguous reads
    :param fq: the fastq file
    :param output_fq: the output fastq file after filtering
    """
    input_iterator = SeqIO.parse(fq, "fastq")
    #goes over each record and tests if the read meets the requirements
    short_iterator = (rec for rec in input_iterator if filter_by_quality(rec.letter_annotations["phred_quality"]) \
        and filter_by_single_nucleotide_appearance(rec.seq) and filter_by_long_stretches_repeats(rec.seq))
    #writes to a new file after the conversion to a fastq format again
    SeqIO.write(short_iterator, output_fq, "fastq")

The problem is that the created file sometimes includes only the last record (the last 4 lines of the input fastq), so I assume it is overwritten in each iteration. However, sometimes it does work and I get all records in one file!

Any idea why is this and how to avoid it?

Thank you!!

sequence • 875 views
ADD COMMENT
1
Entering edit mode

Looks okay to me. Is output_fq an output filename or output file handle? Are you sure the SeqIO.write(..) is called only once on a single file (output_fq) in your code (it should)?

ADD REPLY
0
Entering edit mode

Yes, the output_fq is an output filename and I do call only once the SeqIO.write(..). Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6