Question

How To Get Collections Of Reads From Genbank

0

Entering edit mode

12.3 years ago

Click downvote ▴ 720

I have found some reads that I want to download.

They begin at http://www.ncbi.nlm.nih.gov/nuccore/DQ569913 and end at http://www.ncbi.nlm.nih.gov/nuccore/DQ601958

I could get them one by one, but as there are 32000 of them it seems a tad time consuming. Does anyone know how to download the sequences DQ569913 to 601958 automatically?

Thanks

genbank • 2.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 12.3 years ago by Click downvote ▴ 720

2

Entering edit mode

search biostars.org for EUtilities / EFetch .

ADD REPLY • link 12.3 years ago by Pierre Lindenbaum 166k

2

Entering edit mode

You can use http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ebot/ebot.cgi to create a perl script.

ADD REPLY • link 12.3 years ago by etienne.decarie ▴ 30

score 0 · Answer 1 · 2013-04-26

0

Entering edit mode

12.3 years ago

Click downvote ▴ 720

This sloppy Biopython code does the trick:

# -*- coding: utf-8 -*-                                                                           from Bio import Entrez
Entrez.email = "use@your.real.email.addy.yo"
with open("human_pirna.fa", "w+") as output_file:
    for i in range(569913,601959):
        handle = Entrez.efetch(db="nuccore", id="DQ{0}".format(i), rettype="gb", retmode="text")
        entry = handle.read()
        pirna_string = entry.split("ORIGIN")[1].split("1")[1].split("//")[0].replace(" ", "")
        output_file.write(">DQ{0}\n".format(i))
        print pirna_string
        output_file.write(pirna_string)

ADD COMMENT • link 12.3 years ago by Click downvote ▴ 720

1

Entering edit mode

you can try:

 seq 569913 601959| while read A ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=DQ${A}&rettype=fasta" ; done

ADD REPLY • link 12.3 years ago by Pierre Lindenbaum 166k