I want the virusus that have has host bacteria and I am using BioPython Package.
Entrez.email = "mail"
taxid= 10239
host= "bacteria"
num_records = 20
query = f'taxid:{taxid}[Organism] AND host:{host}[All Fields]'
# Use Entrez.esearch to search for genomes based on the query
handle = Entrez.esearch(db="genome", term=query, retmax=100000) # Increase retmax if needed
# Use Entrez.read to parse the search results
record = Entrez.read(handle)
print(record)
# Get a random subset of genome IDs
genome_ids = random.sample(record["IdList"], min(num_records, len(record["IdList"])))
print(genome_ids[:3])
This is my current code, it is not retrieving any id. What could am I doing wrong? And what can i add to get the fasta and download them?
Download the list of accessions using the Download button at the link you posted above (assuming that is the search you wanted, there are 4000+ genomes not 30 as you mention above). Put them in a file, one accession per line.
$ cat id
NC_001341
NC_028834
NC_023556
Following will give you individual fasta files for the genomes.
$ for i in `cat id`; do echo ${i}; efetch -db nuccore -id ${i} -format fasta > ${i}.fasta; done
If you want to get a single file with all data
$ for i in `cat id`; do echo ${i}; efetch -db nuccore -id ${i} -format fasta >> virii.fasta; done