How To Get Collections Of Reads From Genbank
1
0
Entering edit mode
11.6 years ago

I have found some reads that I want to download.

They begin at http://www.ncbi.nlm.nih.gov/nuccore/DQ569913 and end at http://www.ncbi.nlm.nih.gov/nuccore/DQ601958

I could get them one by one, but as there are 32000 of them it seems a tad time consuming. Does anyone know how to download the sequences DQ569913 to 601958 automatically?

Thanks

genbank • 2.3k views
ADD COMMENT
2
Entering edit mode

search biostars.org for EUtilities / EFetch .

ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode
11.6 years ago

This sloppy Biopython code does the trick:

# -*- coding: utf-8 -*-                                                                           from Bio import Entrez
Entrez.email = "use@your.real.email.addy.yo"
with open("human_pirna.fa", "w+") as output_file:
    for i in range(569913,601959):
        handle = Entrez.efetch(db="nuccore", id="DQ{0}".format(i), rettype="gb", retmode="text")
        entry = handle.read()
        pirna_string = entry.split("ORIGIN")[1].split("1")[1].split("//")[0].replace(" ", "")
        output_file.write(">DQ{0}\n".format(i))
        print pirna_string
        output_file.write(pirna_string)
ADD COMMENT
1
Entering edit mode

you can try:

 seq 569913 601959| while read A ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=DQ${A}&rettype=fasta" ; done
ADD REPLY

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6