I am trying to acquire the genomic fasta sequence for a large list of orthologous gene ids from NCBI. I figured I could convert them to some other identifier, using biopython or mygene, that I could feed in to batch entrez to acquire the fasta sequence of the whole gene sequence; however the accession that matches to the gene fasta is the accession for the whole chromosome/scaffold So I have a few questions, what is the syntax that batch entrez accepts subregion arguements? is there a better identifier that maps to the genomic sequence? is there an easier way to accomplish my task?
I could be wrong but I don't think NCBI provides genomic sequences for genes. You'd have to extract it from the chromosome sequence but first, you'd have to define what region you want because I don't think NCBI provides coordinates for genes, only for RefSeq sequences.