How to download/extract the same gene from all bacterial genomes on NCBI?
1
0
Entering edit mode
16 months ago
tbayer ▴ 50

I would like to get all "groL" gene (nucleotide) sequences from all assembled bacterial genomes on NCBI. For example, searching for (groL) AND "bacteria"[porgn:__txid2] on NCBI Gene get's me a list of the sequences I want, but I seems like I have to open the links one by one to get the seqeuence, the "send to" menu only offers a list. I'd like to avoid sequence searches via BLAST to get this, as they should all be annotated well. Maybe there is a way via the e-utilities?

NCBI • 926 views
ADD COMMENT
0
Entering edit mode

You may want to try Identical Protein Groups at NCBI:

https://www.ncbi.nlm.nih.gov/ipg/

Type "grol" in the search field:

https://www.ncbi.nlm.nih.gov/ipg/?term=grol

Choose "Send to:" drop-down menu on the right and select Destination -> File, Format -> Fasta.

Sorry, just realized you wanted nucleotides. Maybe this still helps you.

ADD REPLY
0
Entering edit mode

Unfortunately I need the sequences for a barcoding taxonomy db, so protein (or back-translations) won't help. Thanks!

ADD REPLY
0
Entering edit mode

You may have to resort to blasting or perhaps simply back-translating the IPG sequences Mensur Dlakic mentioned below. As you probably saw, in the gene database while there is a list of groL entries not all of them have location information. That means we can't simply extract them based on location.

There may be a datasets way of doing this. Will try to see if I can test.

test

ADD REPLY
1
Entering edit mode
16 months ago
tbayer ▴ 50

So I decided instead of all bacterial genomes it was sufficient to have one per genus, which puts the number at a more manageable ~4k. I used rsync to download the *_cds_from_genomic.fna.gz files for all those from refeseq, and then seqkit grep -n -r -p gene=groL *.gz > groL_only.fasta to extract my gene of interest. The task that remains is to translate the genbank IDs from those sequences to the species, that should work with the e-utilities.

ADD COMMENT

Login before adding your answer.

Traffic: 1724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6