Dear collegues,
Could you pls advise where to correct the code,
I need: upload as much as possible fasta files with some bacteria genomes (not certain but group, not certain ID, just lots of) and to convert it futher to json files. Second part I solved with json, but don't understand how to upload on my PC files from NCBI. I have code:
import os
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "my_email@gmail.com"
filename = "new.fasta" # but actually I need to upload separate fasta files to one folder
if not os.path.isfile(filename):
with Entrez.efetch(db="genome", tern="Salmonella+enterica", rettype="fasta_cds_na", retmax=10) as net_handle: # how I will upload next files after this 10
with open(filename, 'w') as out_handle:
out_handle.write(net_handle.read())
print("saved")
print("Parsing...")
record = SeqIO.read(filename, "fasta")
print(record)
# and also I need to input in this code part where I will concert to json:
import json
my_dict = {}
with open("Salm_ser_Enteritidis.fasta", 'r') as new_fasta:
for x in SeqIO.parse(new_fasta, 'fasta'):
my_dict = {
"dataset": x.id,
"sequence": str(x.seq)
}
with open('my_dict.json', 'w') as f:
json.dump(my_dict, f)
thaks, it's sure upload from NCBI, I saw earlier ways to download via ncbi-genome-download for example, but I'm afraid to download without limitations, as there are more that 400 000 genomes and uploaded file might be too big