The only way I found so far is to download the the full genome to which the genes refer and grep all the sequence locally according to the length and starting position. Is there a better way?
You can make complicated queries using that and can chain queries where the results from one query are fed to the next one. See if that helps. Also a youtube video is here
Oh thats a nice one. I didnt found that. Until now I used the r package rentrez for the calls. But it seems this one is more powerful, maybe. But unfortunately I still do not get what I want. Here is my query
esearch -db gene -query 'txid511145[Organism:noexp]' | efetch -format fasta
But this again just returns me the entries of the gene db. Example:
ID: 945651
99. dnaC
DNA biosynthesis protein [Escherichia coli str. K-12 substr. MG1655]
Other Aliases: b4361, ECK4351, JW4325, dnaD
Annotation: NC_000913.3 (4600238..4600975, complement)
I could use the last line to get the corresponding fasta locally. But I would like to know if the server of ncbi I would do this for me or not.
As you know what the accession number of you genome is, you are much better starting from that. The following retrieves all coding sequences for the reference genome
Oh that looked so good. But the result is really not what I hoped it would be. The returned fasta just has 38 entries and this e coli strain should have 4516 genes. Also one of the entries is the whole genome. Not really know what this results refere to any how. As all the genes map only to two entries in the nucore db "NC_000913.3" and "NC_000913.2".
Does your starting point have to be the taxid? The problem with starting with a taxid is that it is not very precise. It sounds like you know the two full reference genomes that you want to extract genes from so why not start from those accession numbers?
That works for me. Yeah the starting point is the organism so the txid. But thats okay. So I check for the best genome and work with this further.
Thanks!
You may try Eutils https://www.ncbi.nlm.nih.gov/books/NBK25500/
I do actually, but cant figure out how to.
What query did you try?
in my question is the query and also the url
I meant esearch/eutils query.. Check the link and try to build a eUtils query.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=txid511145%5BOrganism%3Anoexp%5D but this is not much of a difference as the one I showed before
Sorry, my bad! What I meant was to use Entrez Direct command line tools: https://www.ncbi.nlm.nih.gov/books/NBK179288/
You can make complicated queries using that and can chain queries where the results from one query are fed to the next one. See if that helps. Also a youtube video is here
Oh thats a nice one. I didnt found that. Until now I used the r package rentrez for the calls. But it seems this one is more powerful, maybe. But unfortunately I still do not get what I want. Here is my query
esearch -db gene -query 'txid511145[Organism:noexp]' | efetch -format fasta
But this again just returns me the entries of the gene db. Example:
ID: 945651 99. dnaC DNA biosynthesis protein [Escherichia coli str. K-12 substr. MG1655] Other Aliases: b4361, ECK4351, JW4325, dnaD Annotation: NC_000913.3 (4600238..4600975, complement)
I could use the last line to get the corresponding fasta locally. But I would like to know if the server of ncbi I would do this for me or not.