Entering edit mode
5.1 years ago
anasofiamoreira94
▴
80
Hi all, I want to remove the bacteria data from the all nt database. Can someone tell me what's the best way to remove it? Thanks
As far as I can tell
nt
sequences are annotated at the Genus level. So only way you may be able to do this is to get those names and exclude ones that are bacteria.It may be simpler to post-filter your results for bacteria instead?
As @lieven points out below
should work. Assuming
nt
is properly annotated bacterial taxID.Edit: No sequences in
nt
appear to be annotated with taxID2
so that idea is not going to work.alternatively (if you are using the newest blast version) use the taxonomic filtering options and set that to only report eukaryotic hits. No need to modify your blastDB in this case
EDIT/update : though this seems to work on the NCBI webblast, there are indications this does not work on the (local) CLI version
I'm using blast locally
This would work if I add the Ids of the species to remove. But then again, they can change, so the result will be different.
Hi, I think the search within database should now be possible by limiting taxa even in offline BLAST.
See this NCBI webinar
And/or this post: https://ncbiinsights.ncbi.nlm.nih.gov/2019/01/04/blast-2-8-1-with-new-databases-and-better-performance/.
Bu t if you are after sequences, then I'm not aware of any option to extract the sequences directly from
nt
database. However, one possible way might be to list all accessions innt
(blastdbcmd
), run them throughentrez
OR get yourself accession2taxid table, select which you want and then extract them usingblastdbcmd
.GL