download sequences form ncbi using python
2
1
Entering edit mode
7.4 years ago
HZZ0036 ▴ 30

Hi, I have trouble to down and save sequences from ncbi at one time. I get accession numbers using script:

from Bio import Entrez
def singleEntry(singleID):   #the singleID is the accession number
    handle = Entrez.efetch(db='nucleotide',id=singleID, rettype = 'fasta', retmode= 'text')
    f = open('%s.fasta' % singleID, 'w')
    f.write(handle.read())
    handle.close()
    f.close()


#get an id list: this makes a big search and gets a list of id 

handle = Entrez.esearch(db='nucleotide', term = ["Poaceae[Orgn] AND als[Gene]"])
record = Entrez.read(handle)
handle.close()
print (record["IdList"])

I got IdList:

['1124779319', '1058275694', '160346987', '160346985', '313662298', '313662296', '313662294', '313662292', '148536620', '148536618', '944203885', '937553934', '698322664', '698322662', '698322660', '698322658', '683428019', '677285963', '677285961', '677285959']

Then, how to download those fasta sequences to one file? Thanks. I tried this:

from Bio import Entrez, SeqIO
def get_sequences(IdList):
    ids = record["IdList"]
    for seq_id in ids:
    handle = Entrez.efetch(db="nucleotide", id="seq_id", rettype="fasta", retmode="text")
    record = handle.read()
    record = open('als.fasta', 'w')
    record.write(record.rstrip('\n'))

but it showed: IndentationError: expected an indented block

sequence • 17k views
ADD COMMENT
0
Entering edit mode

hi, did you solve this?, i need to do the same

ADD REPLY
2
Entering edit mode
7.4 years ago

Your for loop needs to be indented.

This is not really a bioinformatics question but a Python programming question and as such it is better suited for https://stackoverflow.com/

record = open('als.fasta', 'w')
for seq_id in ids:
    handle = Entrez.efetch(db="nucleotide", id="seq_id", rettype="fasta", retmode="text")
    record = handle.read()
    record.write(record.rstrip('\n'))
ADD COMMENT
0
Entering edit mode

Ehm I also think OP and you are reusing/overwriting the variable name record.

ADD REPLY
0
Entering edit mode

Thanks. The error has been solved, but there is no als.fasta file even I add the path.

ADD REPLY
0
Entering edit mode

I guess that's because you overwrite the record variable, you use it both for opening als.fasta and for reading the handle from efetch().

ADD REPLY
0
Entering edit mode
4.1 years ago
import os
from Bio import SeqIO
from Bio import Entrez

Entrez.email = "A.N.Other@example.com"  # Always tell NCBI who you are
filename = "MG762674.fasta"
if not os.path.isfile(filename):
    # Downloading...
    net_handle = Entrez.efetch(
        db="nucleotide", id="MG762674", rettype="fasta", retmode="text"
    )
    out_handle = open(filename, "w")
    out_handle.write(net_handle.read())
    out_handle.close()
    net_handle.close()
    print("Saved")

print("Parsing...")
record = SeqIO.read(filename, "fasta")
print(record)

Took this from Biopython Cookbook

You can pass a list for the of id. eg.

Accession_List = ['1124779319', '1058275694', '160346987', '160346985', '313662298', '313662296', '313662294', '313662292', '148536620', '148536618', '944203885', '937553934', '698322664', '698322662', '698322660', '698322658', '683428019', '677285963', '677285961', '677285959']

 db="nucleotide", id=Accession_List, rettype="fasta", retmode="text"

This will save your file as one big fasta file Which you can split later, if you want, into files with individual sequences by using split fasta. i know i am answering late, but this is for anyone who washes up here having the same problem.

ADD COMMENT

Login before adding your answer.

Traffic: 1064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6