Question

Gene FASTA sequence identifiers in batch entrez

0

Entering edit mode

7.5 years ago

easyshortcatchy ▴ 10

I am trying to acquire the genomic fasta sequence for a large list of orthologous gene ids from NCBI. I figured I could convert them to some other identifier, using biopython or mygene, that I could feed in to batch entrez to acquire the fasta sequence of the whole gene sequence; however the accession that matches to the gene fasta is the accession for the whole chromosome/scaffold So I have a few questions, what is the syntax that batch entrez accepts subregion arguements? is there a better identifier that maps to the genomic sequence? is there an easier way to accomplish my task?

sequence • 2.0k views

ADD COMMENT • link 7.5 years ago by easyshortcatchy ▴ 10

0

Entering edit mode

I could be wrong but I don't think NCBI provides genomic sequences for genes. You'd have to extract it from the chromosome sequence but first, you'd have to define what region you want because I don't think NCBI provides coordinates for genes, only for RefSeq sequences.

ADD REPLY • link 7.5 years ago by Jean-Karim Heriche 27k

score 0 · Answer 1 · 2017-06-14

0

Entering edit mode

7.5 years ago

easyshortcatchy ▴ 10

It's okay, I found out that Geneious can do what I want in 5 seconds and I wasted a bunch of hours for nothing, although I am curious how Geneious does it, they probably just know how to parse the xml files properly and query further as needed which is what I was trying to do at one point.

ADD COMMENT • link 7.5 years ago by easyshortcatchy ▴ 10