I would like to get all "groL" gene (nucleotide) sequences from all assembled bacterial genomes on NCBI. For example, searching for (groL) AND "bacteria"[porgn:__txid2]
on NCBI Gene get's me a list of the sequences I want, but I seems like I have to open the links one by one to get the seqeuence, the "send to" menu only offers a list.
I'd like to avoid sequence searches via BLAST to get this, as they should all be annotated well. Maybe there is a way via the e-utilities?
You may want to try Identical Protein Groups at NCBI:
https://www.ncbi.nlm.nih.gov/ipg/
Type "grol" in the search field:
https://www.ncbi.nlm.nih.gov/ipg/?term=grol
Choose "Send to:" drop-down menu on the right and select Destination -> File, Format -> Fasta.
Sorry, just realized you wanted nucleotides. Maybe this still helps you.
Unfortunately I need the sequences for a barcoding taxonomy db, so protein (or back-translations) won't help. Thanks!
You may have to resort to blasting or perhaps simply back-translating the IPG sequences Mensur Dlakic mentioned below. As you probably saw, in the gene database while there is a list of
groL
entries not all of them have location information. That means we can't simply extract them based on location.There may be a
datasets
way of doing this. Will try to see if I can test.