Hello Biostar,
I am writing a script for a very simple annotation pipeline and have run into trouble with GFF parsing.
I am using Brad Chapman's script to convert the GFF output from Prodigal into Genbank format. While I can use the script by calling it, I am running into trouble using the same functions within my own script. Here is my script which is identical to Brad's except for the parts shown here:
def main(input_file):
base, ext = os.path.splitext(input_file)
run_prodigal(input_file)
def run_prodigal(fasta_in):
"""
Writes out Protein Fasta and GBFiles from Prodigal
"""
base, ext = os.path.splitext(fasta_in)
gff_out = "{}.gff".format(base)
proteinfasta_out = "{}_proteins.fasta".format(base)
gb_out = "{}.gb".format(base)
command = "prodigal -i {} -p m -a {} -o {} -f gff".format(fasta_in, proteinfasta_out, gff_out)
subprocess.call(command.split())
#print "Finding Genes for {}".format(base)
#print "Writing GB File, GFF, and Protein fasta for {}".format(base)
fasta_input = SeqIO.to_dict(SeqIO.parse(fasta_in, "fasta", generic_dna))
gff_iter = GFF.parse(gff_out, fasta_in)
#print fasta_input
#print gff_iter
SeqIO.write(_check_gff(_fix_ncbi_id(gff_iter)), gb_out, "genbank")
If I call this I get the following error message:
#call the script
python run_prodigal.py contigs.fasta
#and results in this error message
for rec in fasta_iter:
File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 709, in parse
File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 304, in parse_in_parts
File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 344, in _results_to_features
File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 400, in _add_parent_child_features
File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 510, in _add_toplevel_feature
File "build/bdist.macosx-10.6-x86_64/egg/BCBio/GFF/GFFParser.py", line 479, in _get_rec
TypeError: string indices must be integers, not str
#however I can call the gff_to_genbank.py on using the original fasta and the prodigal generated gff and it works fine
python gff_to_genbank.py contigs.gb contigs.fasta
Although I can work around it, I am not sure why I am getting the error message and would appreciate your thoughts on why this is happening.
thanks, zach cp
thanks brad, you are the man.