Question

Error when converting/translating .tsv ORFs to .faa

0

Entering edit mode

4.6 years ago

anabaena ▴ 10

Hey all, I am working on a script to translate a mass amount of .tsv files containing predicted ORFs using biopython. I keep running into the error that a 'module is not callable when the modules are correctly loaded. Everything works until I enter the translation phase of the script when I receive a module error. Below is the full error:

>       File "./convert_tsv_fasta.py", line 34, in <module>
>         SeqIO.write(proteins, inputfilepath + 'translation.faa', 'fasta')
>       File "/Users/zacharyhenning/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/__init__.py",
> line 556, in write
>         for record in sequences:
>       File "./convert_tsv_fasta.py", line 32, in <genexpr>
>         for nuc_record in SeqIO.parse(outputfilepath, 'fasta')
>       File "./convert_tsv_fasta.py", line 16, in make_protein_record
>         id="translated_" + nuc_record.id,
>     TypeError: 'module' object is not callable

Here is my code:

from Bio import SeqIO
import pandas as pd
import numpy as np
import sys
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
from Bio import SeqIO
from Bio import SeqRecord

inputfilepath = sys.argv[1]
outputfilepath = inputfilepath + "_nucleotide.fasta"
#function being used in script
def make_protein_record(nuc_record):
    return SeqRecord(
        seq=nuc_record.seq.translate(to_stop=True, table='Bacterial'),
        id="translated_" + nuc_record.id,
        )
#initialize DF skipping first few rows

df = pd.DataFrame(pd.read_csv(inputfilepath, skiprows=5, delimiter='\t'))
df = df.drop(columns=['ContigCoord'])

#create dictionary to write to file

dict_y = df['Sequence'].to_dict()
for key, value in dict_y.items():
    with open(outputfilepath, 'a+') as handle:
        handle.write(">" + str(key) + "\n")
        handle.write(value + "\n" + "\n")
    handle.close()

#generator expression to translate files

proteins = (
    make_protein_record(nuc_record)
    for nuc_record in SeqIO.parse(outputfilepath, 'fasta')
)
SeqIO.write(proteins, inputfilepath + 'translation.faa', 'fasta')

biopython translation orfs pandas • 1.3k views

ADD COMMENT • link updated 4.6 years ago by zorbax ▴ 650 • written 4.6 years ago by anabaena ▴ 10

0

Entering edit mode

could you please share a few lines of the tsv file that you are using?

ADD REPLY • link 4.6 years ago by zorbax ▴ 650

0

Entering edit mode

########################################################################################### sample ID: XXXX corresponds to Pangaea ID: XXXX
# Included ENA Sample ID(s): XXX
# Included ENA Run ID(s): XXXX
########################################################################################### GeneID  ContigCoord     Sequence scaffold2_1_gene1     scaffold2_1 strand:+ start:3 stop:1310    CGAGAATTGGAGCGTTACCCAAGTCAGGAAGATCCTAATCTTCCGTCTCACGTACTTGATAGACTTGCGACAATATTTGGAGAAGCTATCTCTGCAACTGAAGCGGTATGGCGAATTATTGCAGATATCCGACTCCATGGGGACCAAGCAATTCTTGACTACACTAAGCGTATTGATGGAACTGAGCTAGTTAATCTAGACATACCGAGTGACACATGGGCTAGCGCTTCAGTTAATTTGAATGAAACTTTACGCGAGGCCTTATCGCTTTCATCGTCACGGATTTTTGATTTCCACCAAGCATGCCTGCCAAAAGATTGGTTTGATGGAAATTTGGGAGTAGGGCTCAAGCACGTTCCCATCGAAAGAGTTGGGATTTATATTCCTGGCGGCACTGCAACGTACCCATCTACAGTCCTGATGAGTGCAATACCGGCACGCGTAGCAGGGGTTAAAGAAATCGTTCTATGTACCCCAAACCCGACTGATTCGGTTTTAACTGCTGCACTTAATGCAAAAGTAGATCAGGTATTTCAAATTGGAGGAGCGCAAGCTATCGCTGCAATGGCATTTGGCAGTGAAAGTATCCCACGGGTGGACAAGATTGTCGGGCCAGGGAATATTTTTGTCTCGATTGCAAAGCGGCTTGTATATGGGAGTGTAGACATTGATGGTCTCTATGGTCCTACAGAAACATTGATAATTGCCGATGATTCAGTGAATCCACAAATAATTGCTTCCGATTTACTAGCACAGGCAGAGCATGACGATCTTGCTACTCCAATACTTATAACTTTTTCACGCGCTATAGCTAACCTCGTTAATGATCACATTGAAGAACAGTCAGCAAATATGCCTAGGGAATCGATTATCAAAAATTCTTTGGCCAACCAAGGCGCTATTCATCTTGTTTCTGATGTATCTGAGGCTATTAAGCTATCCAATGTATTCGCGCCAGAGCACCTCAGTTTACTTATACGAGATGCAGAGAAATACATTCCGCAAATCGAAAATGCAGGTGGAATCTTCGTAGGGGAAAACAGCCCTGAAGTTTTAGGGGATTATGTGATTGGACCCAGTCATGTGATGCCAACTGGTGGTACTGCAAGATTTGCCTCTAACTTGGGAATTAATTCCTTTCTCAAGCAGATTCCTATAATGAATTTGTCATCTTCAACAATGCTGCAACTTGCACCCGCTGCTGTCGAAATAGCTGGCATAGAAGGTCTAAGTGCACATTCTGCCTCTGCTGCAATACGGATTGAAGGGATTGAAAGTACTTCCGACGAAAAGGGGCAATCGAGCTGA

ADD REPLY • link 4.6 years ago by anabaena ▴ 10

1

Entering edit mode

In from Bio import SeqRecord you should use Bio.SeqRecord import SeqRecord instead. Have you considered the six frame translations? you can use from Bio.SeqUtils import six_frame_translations

ADD REPLY • link 4.6 years ago by zorbax ▴ 650

score 2 · Accepted Answer · 2020-04-08

If you want to save the sequence in fasta format and the translation using the function translate() of the method Seq

#!/usr/bin/env python3

import sys
import pandas as pd
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

inputfilepath = sys.argv[1]
outputfilepath = inputfilepath.split(".")[0] + "_nucleotide.fasta"

df = pd.DataFrame(pd.read_csv(inputfilepath, skiprows=5, delimiter='\t'))
dict_y = dict(zip(df['GeneID'], df['Sequence']))

results = []
for key, value in dict_y.items():
    seq_value = Seq(value)
    record = SeqRecord(seq=seq_value.translate(to_stop=True, table='Bacterial'),
                       id="translated_" + key, description="")
    results.append(record)

    with open(outputfilepath, 'a+') as handle:
        handle.write(">" + str(key) + "\n")
        handle.write(value + "\n")

SeqIO.write(results, inputfilepath.split(".")[0] + '_translation.faa', 'fasta')