Entering edit mode
8.0 years ago
felipelira3
▴
40
This is my code to extract the reads from a raw fastq file. I have one list with the headers of the reads and I want to retreive them from the original file. My output is an empty file. Maybe I did it not so well. Any help, improving this script will be appreciate. Less is more.
import sys
from Bio import SeqIO
from Bio.SeqIO.QualityIO import FastqGeneralIterator
list_of_reads = open(sys.argv[1], "r")
fastq_file = open(sys.argv[2], "r")
recover = set()
for line in list_of_reads:
if line not in recover:
read_name = line.split()[0]
recover.add(read_name) # add value to set
output = open("recovered.fastq", "w")
for title, seq, qual in FastqGeneralIterator(fastq_file):
if title in recover:
output.write("@%s\n%s\n+\n%s\n" % (title, seq, qual))
output.close()
First guess: line in your first for loop ends with '\n', while title in the second for loop does not: add:
in the first loop before the if ... .
Wow a beginner mistake. Thank you @dschika
I think (but this should be tested) that it would be quicker to just loop over the list_of_reads and create a list, and create a set out of that later. Something like
If performance is important, there are existing tools that will do the job:
A: Extracting fastq files, based on their fasta counterparts
filterbyname will alternatively accept a list of read names as the argument to "names", rather than a fasta file.