Entering edit mode
5.8 years ago
genomes_and_MGEs
▴
10
Hey guys,
I would like to blastx all bacterial assemblies available on NCBI against a database of several proteins, without downloading the genomes (to save disk space - I'm working on a supercomputer, no cloud available so I'm totally relying on PC's physical space). The only output I would like to have on my PC is the hit genomes containing those proteins from my physical database. Do you have a solution? Thanks!
All bacterial assemblies takes less than 600GB space. Surely a supercomputer has that much space to spare. All bacterial proteomes would be like 100GB, if even that..
Use the
-remote
option of commandline blast?Will try to have a look, thanks!
You may want to do
tblastn
with your proteins (rather than the other way around) if you use the-remote
option (since you don't want to download the genomes). Not sure how much time NCBI allows per query but choosing all bacterial assemblies/genomes may run up against the limit. Start with a single protein and an "Entrez query" restricting blast to a genus before expanding the search.As @5heikki says below, as long as you have space available on the supercomputer this search would be best done locally by downloading the genomes/proteomes.