Need help reading FASTQ file
4
0
Entering edit mode
7.3 years ago
sarmademad • 0

I need help to read fastq file on my server virtual machine im using the following python code

def readFastq(filename):
    sequences = []
    qualities = []
    with open(filename) as fh:
        while True:
            fh.readline()
            seq = fh.readline().rstrip()
            fh.readline()
            qual =   fh.readline().rstrip()
            if len(seq) == 0:
                break
            sequences.append(seq)
            qualities.append(qual)
    return sequences, qualities
seqs, quals = readFastq('sample-seqs.fastq')
print (seqs)

giving me the following

syntax error: invalid syntax      (pointing to the line before the last line)

thank you

RNA-Seq • 4.4k views
ADD COMMENT
0
Entering edit mode

If it's pointing to the line before the last line, is your input filename/path correct?

You code runs for me, though the output only returns [+], [+], [+] - so some of your .readline() logic is off too.

ADD REPLY
3
Entering edit mode
7.3 years ago
skbrimer ▴ 740

If it is not a homework question you could use biopython

from Bio import SeqIO

for seqrecord in SeqIO.parse("file.fastq", "fastq"):
    print(seqrecord.seq)
ADD COMMENT
2
Entering edit mode
7.3 years ago
Joe 21k

I fully agree with the others, that unless there is a very good reason to write your own fastq parser, you should use Biopython etc.

However, for completeness, here's some working code. I've changed it quite a lot since there really isn't a good reason to use readline() since files are inherently iterable.

# Usage: python readfastq.py myseqs.fastq

import sys
from itertools import izip

def readFastq(filename):
        headers = []
        sequences = []
        qualities = []
        with open(filename, 'r') as ifh:
                for header, sequence, sep, quality in izip(ifh, ifh, ifh, ifh):
                        headers.append(header.rstrip())
                        sequences.append(sequence.rstrip())
                        qualities.append(quality.rstrip())

        return headers, sequences, qualities


h, s, q = readFastq(sys.argv[1])
# Do whatever with h, s and q....

There may be more efficient ways to go about it, but use of izip at least means this won't eat up all your memory if you ran it on large files.

ADD COMMENT
1
Entering edit mode

This is elegant for parsing clean fastq file. And in python3, zip can be used to replace izip.

ADD REPLY
0
Entering edit mode

Yeah I should probably add the caveat that I'm assuming a perfect input, but I would hope if someone was going to write their own parser, they'd have some idea of whether that was going to be the case or not.

ADD REPLY
0
Entering edit mode

This parser assumes the user needs the entire file (h, s, q) in memory, while it's (I guess) more likely iterating over the fastq is more desirable.

ADD REPLY
1
Entering edit mode

Yeah I wrote it like that because that's how the OP had the code (returning a list of everything in the file). OP should consider using yeild instead of return if performance becomes and issue and just make it a generator function.

ADD REPLY
1
Entering edit mode
7.3 years ago
Renesh ★ 2.2k

Your code has a lot of issues associated with data structures. Please, see below code,

def readFastq(filename):
     fastq_open = open(filename, "rU")     
     for line in fastq_open:
          header1 = line.rstrip()
          sequences = next(fastq_open).rstrip()
          header2 = next(fastq_open).rstrip()
          qualities = next(fastq_open).rstrip()
          if len(seq) == 0:
            break
          return sequences, qualities
     fastq_open.close()

seqs, quals = readFastq('sample-seqs.fastq')
print (seqs)
ADD COMMENT
0
Entering edit mode

You're better off keeping the OP's with open() construct, as it handles file opening and closing. Your current code leaves a dangling file open.

ADD REPLY
0
Entering edit mode

the close file line added

ADD REPLY
0
Entering edit mode

with is the preferred method for working with files, rather than "manually" opening and closing.

ADD REPLY
0
Entering edit mode

if you ever tried any test files, you should know this code is not working.

ADD REPLY
1
Entering edit mode
7.3 years ago
shoujun.gu ▴ 380

Do not try to write a code to parse the fastq file by yourself (unless you just want to practice), its a tricky one. Just use biopython like skbrimer mentioned above.

ADD COMMENT

Login before adding your answer.

Traffic: 2097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6