I'm a bioinformatics intern and I'm looking for a script that converts several .gbk files to .fa files in a directory. I already tried in several ways to do but I did not get any results.
You don't need to mess about with parse and write as there is a convenience function for this (unless you want to specifically control the metadata that gets written):
To run over an entire folder:
for file in /path/to/dir/*.gbk ; do
python -c "from Bio import SeqIO; SeqIO.convert($file, genbank, ${file%.*}.fasta, fasta);"
done
from Bio import SeqIO
import os, sys
for raiz, subpasta, arquivo in os.walk(
'/Documentos/parse/GCF_000231365.1/AntiSmash/GCF_000231365.1_ASM23136v1_genomic$'
):
origem = os.path.join(raiz, arquivo)
if origem.endswith(".gbk"):
with open(origem, "rU") as input_handle:
destino = origem.replace(".gbk", ".fa")
with open(destino, "w") as output_handle:
sequences = SeqIO.parse(input_handle, ".gbk")
count = SeqIO.write(sequences, output_handle, ".fasta")
if len(sys.argv) != 3:
sys.exit(__doc__)
output_handle.write(
">% s de% s \ n% s \ n" % (seq_feature.qualifiers['locus_tag'][0],
seq_record.name,
seq_feature.qualifiers['tradução'][0]))
output_handle.close()
input_handle.close()
If you know python you can use
SeqIO
to read the gbk and write in gasta formatwanted to put to read several gbk files in a folder
A for loop then? Where did you get the data from? Are you sure there aren't fasta files available?