Question

convert gbk to fasta

0

Entering edit mode

4.9 years ago

wellinsantos84 ▴ 10

Hi, guys!

I'm a bioinformatics intern and I'm looking for a script that converts several .gbk files to .fa files in a directory. I already tried in several ways to do but I did not get any results.

Can anybody help me?

genome • 5.1k views

ADD COMMENT • link updated 5 weeks ago by Christopher Bottoms ▴ 10 • written 4.9 years ago by wellinsantos84 ▴ 10

1

Entering edit mode

If you know python you can use SeqIO to read the gbk and write in gasta format

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

wanted to put to read several gbk files in a folder

ADD REPLY • link 4.9 years ago by wellinsantos84 ▴ 10

1

Entering edit mode

A for loop then? Where did you get the data from? Are you sure there aren't fasta files available?

ADD REPLY • link 4.9 years ago by Asaf 10k

score 2 · Answer 1 · 2020-01-30

2

Entering edit mode

4.9 years ago

Joe 21k

You don't need to mess about with parse and write as there is a convenience function for this (unless you want to specifically control the metadata that gets written):

To run over an entire folder:

for file in /path/to/dir/*.gbk ; do
    python -c "from Bio import SeqIO; SeqIO.convert($file, genbank, ${file%.*}.fasta, fasta);"
done

ADD COMMENT • link 4.9 years ago by Joe 21k

0

Entering edit mode

Thanks Joe! This was super useful.

I added some quotes to make it work for me:

for file in *.gbk ; do                                                              
    python -c "from Bio import SeqIO; SeqIO.convert('$file', 'genbank', '${file%.*}.fasta', 'fasta');"
done

ADD REPLY • link 5 weeks ago by Christopher Bottoms ▴ 10

Ram · Answer 2 · 2020-01-29

from Bio import SeqIO
import os, sys

for raiz, subpasta, arquivo in os.walk(
        '/Documentos/parse/GCF_000231365.1/AntiSmash/GCF_000231365.1_ASM23136v1_genomic$'
):
    origem = os.path.join(raiz, arquivo)
if origem.endswith(".gbk"):
    with open(origem, "rU") as input_handle:
        destino = origem.replace(".gbk", ".fa")
with open(destino, "w") as output_handle:
    sequences = SeqIO.parse(input_handle, ".gbk")
    count = SeqIO.write(sequences, output_handle, ".fasta")
if len(sys.argv) != 3:
    sys.exit(__doc__)
    output_handle.write(
        ">% s de% s \ n% s \ n" % (seq_feature.qualifiers['locus_tag'][0],
                                   seq_record.name,
                                   seq_feature.qualifiers['tradução'][0]))
output_handle.close()
input_handle.close()

score 0 · Answer 3 · 2020-01-30

0

Entering edit mode

4.9 years ago

onestop_data ▴ 330

GenBank to fasta sequence

ADD COMMENT • link 4.9 years ago by onestop_data ▴ 330