Entering edit mode
3.8 years ago
Wilber0x
▴
50
Is there a way to download specific gene sequences from genbank using Seq.IO and the gene IDs? I know how to download a genome sequence with the genbank ID, but I am looking to use the gene IDs within a genbank file to directly download sequences from that genome sequence.
What do you mean by a gene ID? Can you give us a few examples?
So the genbank id for Actinidia chinensis is
NC_026690.1
whereas if i want to look at trnR (UCU) within that genome it has a gene ID which is23857713
NC_xxxx is an entire contig, and here is an entire genome. It contains multiple genes and thus multiple Entrez Gene IDs. You should be able to extract all
gene
features from the genbank file, get the db_xref for each of them and use the Entrez IDs in a straightforward manner. Each step will need some digging on Google and some experimentation, but it should not be too challenging to figure this out from the outline I've given you.Using EntrezDirect you can get the fasta sequence for all genes for this accession. Returned data should be parse able in python to keep ones you want. (truncated to show just headers and a few example due to space).