Entering edit mode
3 months ago
echolley
▴
20
How would I download the nr database to just have bacterial/viral sequences? I see lots of options to download various segments of the nt database, but not nr. Curious to know if there's a good solution!
Thanks!
You can use tools such as https://github.com/pirovc/genome_updater to download bacterial/viral sequences from GenBank and/or RefSeq and build blast database by yourself.
Problem with that is there will be lot of redundancy that will increase the size of the database (unless OP does additional work to remove the redundancy).
Potentially better option may be to download pre-made
nr
databases extract entries marked with taxID's of interest (these will need to be extracted for bacteria and virii at species level) and then make a new database from that.Thank you for the advice? How would the best way to go about this? Say I get a list of taxids, how would I go about extracting these entries? Thank you!
See the solution here: Extract all bacteria sequences from the nr database
See examples on
taxonkit
page.