Question

download sequences form ncbi using python

1

Entering edit mode

7.8 years ago

HZZ0036 ▴ 30

Hi, I have trouble to down and save sequences from ncbi at one time. I get accession numbers using script:

from Bio import Entrez
def singleEntry(singleID):   #the singleID is the accession number
    handle = Entrez.efetch(db='nucleotide',id=singleID, rettype = 'fasta', retmode= 'text')
    f = open('%s.fasta' % singleID, 'w')
    f.write(handle.read())
    handle.close()
    f.close()


#get an id list: this makes a big search and gets a list of id 

handle = Entrez.esearch(db='nucleotide', term = ["Poaceae[Orgn] AND als[Gene]"])
record = Entrez.read(handle)
handle.close()
print (record["IdList"])

I got IdList:

['1124779319', '1058275694', '160346987', '160346985', '313662298', '313662296', '313662294', '313662292', '148536620', '148536618', '944203885', '937553934', '698322664', '698322662', '698322660', '698322658', '683428019', '677285963', '677285961', '677285959']

Then, how to download those fasta sequences to one file? Thanks. I tried this:

from Bio import Entrez, SeqIO
def get_sequences(IdList):
    ids = record["IdList"]
    for seq_id in ids:
    handle = Entrez.efetch(db="nucleotide", id="seq_id", rettype="fasta", retmode="text")
    record = handle.read()
    record = open('als.fasta', 'w')
    record.write(record.rstrip('\n'))

but it showed: IndentationError: expected an indented block

sequence • 17k views

ADD COMMENT • link updated 4.5 years ago by naman.misanthropist • 0 • written 7.8 years ago by HZZ0036 ▴ 30

0

Entering edit mode

hi, did you solve this?, i need to do the same

ADD REPLY • link 6.0 years ago by brunofede22 • 0

score 2 · Answer 1 · 2017-07-10

2

Entering edit mode

7.8 years ago

Istvan Albert 102k

Your for loop needs to be indented.

This is not really a bioinformatics question but a Python programming question and as such it is better suited for https://stackoverflow.com/

record = open('als.fasta', 'w')
for seq_id in ids:
    handle = Entrez.efetch(db="nucleotide", id="seq_id", rettype="fasta", retmode="text")
    record = handle.read()
    record.write(record.rstrip('\n'))

ADD COMMENT • link 7.8 years ago by Istvan Albert 102k

0

Entering edit mode

Ehm I also think OP and you are reusing/overwriting the variable name record.

ADD REPLY • link 7.8 years ago by WouterDeCoster 48k

0

Entering edit mode

Thanks. The error has been solved, but there is no als.fasta file even I add the path.

ADD REPLY • link 7.8 years ago by HZZ0036 ▴ 30

0

Entering edit mode

I guess that's because you overwrite the record variable, you use it both for opening als.fasta and for reading the handle from efetch().

ADD REPLY • link 7.8 years ago by WouterDeCoster 48k

GenoMax · Answer 2 · 2020-10-28

import os
from Bio import SeqIO
from Bio import Entrez

Entrez.email = "A.N.Other@example.com"  # Always tell NCBI who you are
filename = "MG762674.fasta"
if not os.path.isfile(filename):
    # Downloading...
    net_handle = Entrez.efetch(
        db="nucleotide", id="MG762674", rettype="fasta", retmode="text"
    )
    out_handle = open(filename, "w")
    out_handle.write(net_handle.read())
    out_handle.close()
    net_handle.close()
    print("Saved")

print("Parsing...")
record = SeqIO.read(filename, "fasta")
print(record)

Took this from Biopython Cookbook

You can pass a list for the of id. eg.

Accession_List = ['1124779319', '1058275694', '160346987', '160346985', '313662298', '313662296', '313662294', '313662292', '148536620', '148536618', '944203885', '937553934', '698322664', '698322662', '698322660', '698322658', '683428019', '677285963', '677285961', '677285959']

 db="nucleotide", id=Accession_List, rettype="fasta", retmode="text"

This will save your file as one big fasta file Which you can split later, if you want, into files with individual sequences by using split fasta. i know i am answering late, but this is for anyone who washes up here having the same problem.