Here's a straightforward question that comes up a lot in my lab:
Is [DNA Sequence Feature X] in [Gene A] widely conserved across bacteria?
To start to answer that, I just need the DNA sequences of [Gene A] across bacteria. How do I do that?
One way: I'm about to just download a few thousand bacterial genomes from NCBI Assembly, build a custom BLAST database from those, search for the most similar sequence to [Gene A] in each genome, and then use those sequences. I think that'll take a little while. Is that the best way to go about this problem?
It seems like NCBI has some annotations for bacterial homologs--even though there's no Homologene. For example, I can search for a gene in NCBI Gene--say metG as an example, and I get this table:
I can download the table, but the "Location" is left out in many entries (like the last row in the image above). So, with the table, I can't automatically grab homologous DNA sequences. (Same with doing the same query from the command line with Entrez). Is there a "Download all DNA sequences from this table" option in NCBI Gene that I'm just missing?
Do you have any suggestions? This seems like a pretty straightforward problem so I think I'm probably missing something simple. Thanks!
This is the sort of answer that looks obvious in retrospect. I'll use this in addition to the more comprehensive search that Mensur suggested. Also a good lesson to me (and hopefully others who happen upon this post) to spend more time on esearch and efetch before giving up! You and Mensur have been super helpful here--thank you!!