Hello, I am a beginner in bioinformatics. I have to get all the sequences of the klebsiella genome from ncbi. I have to use biopython for my internship. except that I absolutely need the number that there is in the link ( https://www.ncbi.nlm.nih.gov/genome/browse/#!/prokaryotes/815/ ) from "Genome Assembly and Annotation report (10703)" so I recovered the identifiers from " https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt " and I tried to make a script in biopython that allows me to recover the sequences. But the script doesn't work, I guess the identifiers on ftp and on the nucleotide database are not the same. I would like to know if there is some kind of correspondence between the nuccore(nucleotide) identifiers and the ones on ftp.
I have looked at https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/IDS/Bacteria.ids and there are only 31 identifiers and not all of them. Thanks a lot. Here is the biopython code:
from Bio import SeqIO
from Bio import Entrez
list_id = []
file = open("listId.txt", "r")
readlineFile=file.readline()
print(readline)
for line in file:
file.readline()
list_id.append(line)
print(List_id)
fic_seq = Entrez.efetch(db="nucleotide", id="list_id", rettype="gb")
my_seq=SeqIO.parse(fic_seq,"gb")
for seq in my_seq :
print (seq)
my_seq=SeqIo.parse(fic_seq,"gb")
SeqIO.write(my_seq, "out.fasta", "fasta")
fic_seq.close()
Try executing your
efetch
successfully with a single ID. Once done, expand to working with multiple IDs the right way. Right now, you're using the string "list_id" as an ID, where you need to be using every member in the objectlist_id
. And of course, ensure the IDs are ones you can use for retrieval as well.Thank you for your answer. Yes, when I take an identifier it doesn't work because the identifier of the nucleotide database and of the ftp file are different. I am looking for a way to link the identifiers of the nucleotide db and the identifiers of the ftp file
when I use an identifier of the first column which corresponds to the identifier of the db nucleotide of this link " https://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/IDS/Bacteria.ids " it works