EDIT SKB 17MAR16 Hi Group,
figured out that I was not calling the correct variable in my loop. I made a handle file outside of the loop to append, but then was using the SeqIO.write function without the handle, so it was writing over the same file with just the last record. Please find the correct script below and thanks for the help!
# this script is used to convert fastq files to fasta files
# then to rename the fasta ID with the sample ID from the lab
from Bio import SeqIO
import sys
# grabbing the file and the name
seq_file = sys.argv[1]
labels = seq_file.split(".")
# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")
# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","a")
for seq_record in SeqIO.parse(handle,"fasta"):
old_header = seq_record.id
new_header = labels[0]
seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
# the lab id and the referance
# used
seq_record.description = "" # this strips the old header out
SeqIO.write(seq_record, handle,"fasta")
handle.close()
/EDIT
Hi group,
I wrote a script that will take the fastq output from samtools mpileup > vcf2fq and create a fasta file that includes the drops in coverage. When I use this scripted for a bacterial genome it works as expected, however when I use it on a segmented genome it only outputs the last segment in the file. I'm using python3.4 and biopython1.66. Please see my code and troubleshooting below.
# this script is used to convert fastq files to fasta files
# then to rename the fasta ID with the sample ID from the lab
from Bio import SeqIO
import sys
# grabbing the file and the name
seq_file = sys.argv[1]
labels = seq_file.split(".")
# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")
# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","a+")
for seq_record in SeqIO.parse(handle,"fasta"):
old_header = seq_record.id
new_header = labels[0]
seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
# the lab id and the referance
# used
seq_record.description = "" # this strips the old header out
SeqIO.write(seq_record, labels[0]+".fasta","fasta")
handle.close()
I think I have an error when writing the new record. I have been playing with this line:
handle = open(labels[0]+".fasta","a+")
When I have it written like this, It runs without error, I get a multi-fasta output without the header change; however when I have the code written like this:
handle = open(labels[0]+".fasta","rU")
like in the biopython tutorial, I get the output header I want, but only the last record in the file. I'm not sure if or why with the file open in "append" is just skipping the next part of the script. I would appreciate any input be appreciated
Its strange, Ideally, in
"rU"
mode, the file should not be writable. Its strange that you are able to write in"rU"
mode.I think its weird as well, also the universal newline 'U' argument is deprecated in python as well, I'm not sure how much longer it will still be an option.