Entering edit mode
3.2 years ago
O.rka
▴
740
Does anyone know how to download these databases properly for BBTools?
I got it to work back in 2019 but can't seem to get it to work with the newest version. Here's the error I've gotten with the fetch scripts:
Version: BBMap version 38.93
RefSeq
(bbmap_env) -bash-4.2$ /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/bbmap_env/opt/bbmap-38.93-0/pipelines/fetch/fetchRefSeq.sh
java -ea -Xmx1g -Xms1g -cp /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/bbmap_env/opt/bbmap-38.93-0/current/ tax.RenameGiToTaxid -Xmx1g in=stdin.fa.gz out=renamed.fa.gz pigz=16 unpigz zl=9 server ow maxbadheaders=5000 badheaders=badHeaders.txt bgzip
Executing tax.RenameGiToTaxid [-Xmx1g, in=stdin.fa.gz, out=renamed.fa.gz, pigz=16, unpigz, zl=9, server, ow, maxbadheaders=5000, badheaders=badHeaders.txt, bgzip]
Time: 488.852 seconds.
Reads Processed: 39175 0.08k reads/sec
Bases Processed: 1273m 2.61m bases/sec
Valid Sequences: 39175
Valid Bases: 1273716988
Invalid Sequences: 0
Invalid Bases: 0
Exception in thread "main" java.lang.RuntimeException: tax.RenameGiToTaxid terminated in an error state; the output may be corrupt.
at tax.RenameGiToTaxid.process(RenameGiToTaxid.java:307)
at tax.RenameGiToTaxid.main(RenameGiToTaxid.java:39)
My speculation is since NCBI has deprecated use of
gi
numbers this script no longer works.Do you know if there are any plans in the future to update these scripts? A big fan of the BBTools suite and would prefer to include them in pipelines I'm actively working on and potentially will publish.
Do you remember what was the end result of this script when it ran in 2019? Is it just downloading RefSeq data? Perhaps I can suggest an alternate way.
The end result was a bunch of sketch files:
Each one looks like this: