EDIT:SOLVED:changing the fasta header in a multi-fasta file with biopython.
1
1
Entering edit mode
8.7 years ago
skbrimer ▴ 740

EDIT SKB 17MAR16 Hi Group,

figured out that I was not calling the correct variable in my loop. I made a handle file outside of the loop to append, but then was using the SeqIO.write function without the handle, so it was writing over the same file with just the last record. Please find the correct script below and thanks for the help!

# this script is used to convert fastq files to fasta files 
# then to rename the fasta ID with the sample ID from the lab

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","a")

for seq_record in SeqIO.parse(handle,"fasta"):
    old_header = seq_record.id
    new_header = labels[0]
    seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
                                                  # the lab id and the referance 
                                                  # used
    seq_record.description = "" # this strips the old header out
    SeqIO.write(seq_record, handle,"fasta")

handle.close()

/EDIT

Hi group,

I wrote a script that will take the fastq output from samtools mpileup > vcf2fq and create a fasta file that includes the drops in coverage. When I use this scripted for a bacterial genome it works as expected, however when I use it on a segmented genome it only outputs the last segment in the file. I'm using python3.4 and biopython1.66. Please see my code and troubleshooting below.

# this script is used to convert fastq files to fasta files 
# then to rename the fasta ID with the sample ID from the lab

from Bio import SeqIO
import sys 

# grabbing the file and the name 
seq_file = sys.argv[1]
labels = seq_file.split(".")

# converting the file from fastq to fasta
SeqIO.convert(seq_file,"fastq",labels[0]+".fasta","fasta")

# taking the converted file and then changing the fasta header
handle = open(labels[0]+".fasta","a+")

for seq_record in SeqIO.parse(handle,"fasta"):
    old_header = seq_record.id
    new_header = labels[0]
    seq_record.id = new_header + "_" + old_header # renaming the pseudogene with
                                                  # the lab id and the referance 
                                                  # used
    seq_record.description = "" # this strips the old header out
    SeqIO.write(seq_record, labels[0]+".fasta","fasta")

handle.close()

I think I have an error when writing the new record. I have been playing with this line:

handle = open(labels[0]+".fasta","a+")

When I have it written like this, It runs without error, I get a multi-fasta output without the header change; however when I have the code written like this:

handle = open(labels[0]+".fasta","rU")

like in the biopython tutorial, I get the output header I want, but only the last record in the file. I'm not sure if or why with the file open in "append" is just skipping the next part of the script. I would appreciate any input be appreciated

biopython • 5.6k views
ADD COMMENT
0
Entering edit mode

Its strange, Ideally, in "rU" mode, the file should not be writable. Its strange that you are able to write in "rU" mode.

ADD REPLY
0
Entering edit mode

I think its weird as well, also the universal newline 'U' argument is deprecated in python as well, I'm not sure how much longer it will still be an option.

ADD REPLY
1
Entering edit mode
8.7 years ago
skbrimer ▴ 740

See original thread for answer

ADD COMMENT

Login before adding your answer.

Traffic: 2402 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6