Question

Downloading Cds Fasta Sequences From Gene Ids

1

Entering edit mode

11.2 years ago

lin.barnum ▴ 230

I have a list of geneIDs obtained from UCSC genome browser like this:

CG6854
CG8119
CG8359
CG9437
CpipJ_CPIJ001450
CpipJ_CPIJ002577
CpipJ_CPIJ011605
CpipJ_CPIJ011632
CpipJ_CPIJ016978
GA11162
GA11800
GA12610

All of these are from the insect group. I would like to obtain the fasta CDS for these genes without introns. I can do this individually so hopefully there should a way to automate it as I have 195 of these. Any ideas on how this can be done would be appreciated.

fasta ucsc gene • 3.4k views

ADD COMMENT • link updated 11.2 years ago by viv_bio ▴ 50 • written 11.2 years ago by lin.barnum ▴ 230

score 0 · Answer 1 · 2013-09-12

If you want to automate it install Python and Biopython

open python

from Bio import Entrez , SeqIO
handle = Entrez.efetch("pubmed", id="CG6854,CG8119,CG8359,CG9437,CpipJ_CPIJ001450", retmode="xml")
records = Entrez.parse(handle)
for record in records:
      write_list.append(record)

SeqIO.write(write_list, "output_file","fasta")

in place of id copy paste all ids. and you will get a output file.