I want to use sketch.sh
and sendsketch.sh
on a marine metagenome that have both bacteria and eukarya. I was testing it on a new Cylindrotheca (diatom eukaryote) assembly and it actually recognized it! Nothing else I've tried has been able to do this so I'm pretty stoked. However, I think there may be an error with sketch.sh
b/c it looks like the single
option is actually only grabbing the first sequence. Check out the Query: contig_1
and Query: scaffolds.drop_bacteria.fasta
.
I've checked all of the default settings and they seem to line up.
Create a sketch using sketch.sh
(metagenomics_env) -bash-4.1$ sketch.sh in=scaffolds.drop_bacteria.fasta out=scaffolds.drop_bacteria.fasta.sketch1
java -ea -Xmx5878m -Xms5878m -cp /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/opt/bbmap-38.58-0/current/ sketch.SketchMaker in=scaffolds.drop_bacteria.fasta out=scaffolds.drop_bacteria.fasta.sketch1
Executing sketch.SketchMaker [in=scaffolds.drop_bacteria.fasta, out=scaffolds.drop_bacteria.fasta.sketch1]
Finished sketching: 3.140 seconds.
Memory: max=6163m, total=6163m, free=6043m, used=120m
Wrote 1 of 1 sketches.
Time: 3.456 seconds.
Reads Processed: 738 0.21k reads/sec
Bases Processed: 82339k 23.82m bases/sec
Use the sketch from sketch.sh
with sendsketch.sh
(Query: contig_1)
This was not the expected result.
(metagenomics_env) -bash-4.1$ sendsketch.sh in=scaffolds.drop_bacteria.fasta.sketch1
Warning! Cannot find blacklist_refseq_species_250.sketch /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/opt/bbmap-38.58-0/current/blacklist_refseq_species_250.sketch
java.lang.Exception
at dna.Data.findPath(Data.java:1244)
at sketch.Blacklist.refseqBlacklist(Blacklist.java:155)
at sketch.SendSketch.setFromAddress(SendSketch.java:284)
at sketch.SendSketch.<init>(SendSketch.java:200)
at sketch.SendSketch.main(SendSketch.java:50)
Loaded 1 sketch in 0.036 seconds.
Query: contig_1 DB: RefSeq SketchLen: 26366 Seqs: 738 Bases: 82339741 gSize: 55126726 File: scaffolds.drop_bacteria.fasta
WKID KID ANI Complt Contam Matches Unique TaxID gSize gSeqs taxName
0.03% 0.01% 69.28% 100.00% 0.40% 3 2 556484 25154K 88 Phaeodactylum tricornutum CCAP 1055/1
0.04% 0.00% 70.87% 4.72% 0.62% 3 2 75366 1135M 31277 Sinocyclocheilus grahami
0.01% 0.00% 67.16% 31.41% 0.41% 3 1 6412 174854K 1991 Helobdella robusta
0.01% 0.00% 67.15% 31.71% 0.41% 3 0 743375 173229K 1483 Polistes dominula
Total Time: 0.908 seconds.
Create the sketch with sendsketch.sh
( Query: scaffolds.drop_bacteria.fasta)
This was the expected result.
(metagenomics_env) -bash-4.1$ sendsketch.sh in=scaffolds.drop_bacteria.fasta outsketch=scaffolds.drop_bacteria.fasta.outsketch
Warning! Cannot find blacklist_refseq_species_250.sketch /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/opt/bbmap-38.58-0/current/blacklist_refseq_species_250.sketch
java.lang.Exception
at dna.Data.findPath(Data.java:1244)
at sketch.Blacklist.refseqBlacklist(Blacklist.java:155)
at sketch.SendSketch.setFromAddress(SendSketch.java:284)
at sketch.SendSketch.<init>(SendSketch.java:200)
at sketch.SendSketch.main(SendSketch.java:50)
Loaded 1 sketch in 7.015 seconds.
Query: scaffolds.drop_bacteria.fasta DB: RefSeq SketchLen: 52788 Seqs: 738 Bases: 82339741 gSize: 55423446 File: scaffolds.drop_bacteria.fasta
WKID KID ANI Complt Contam Matches Unique TaxID gSize gSeqs taxName
2.81% 0.01% 87.88% 100.00% 0.26% 5 5 2856 188927 2 Cylindrotheca closterium
0.04% 0.00% 70.87% 4.72% 0.62% 3 2 75366 1135M 31277 Sinocyclocheilus grahami
0.01% 0.01% 67.31% 100.00% 0.26% 3 2 556484 25154K 88 Phaeodactylum tricornutum CCAP 1055/1
0.01% 0.01% 66.69% 64.57% 0.31% 4 0 121224 85172K 1882 Pediculus humanus corporis
0.01% 0.01% 66.79% 100.00% 0.26% 3 1 296543 30694K 64 Thalassiosira pseudonana CCMP1335
0.01% 0.00% 67.16% 31.41% 0.41% 3 1 6412 174854K 1991 Helobdella robusta
0.01% 0.00% 67.15% 31.71% 0.41% 3 0 743375 173229K 1483 Polistes dominula
Total Time: 7.746 seconds.
Use the sketch from sendsketch.sh
with sendsketch.sh
again (Query: contig_1)
This also was the expected result
(metagenomics_env) -bash-4.1$ sendsketch.sh in=scaffolds.drop_bacteria.fasta.outsketch
Warning! Cannot find blacklist_refseq_species_250.sketch /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/metagenomics_env/opt/bbmap-38.58-0/current/blacklist_refseq_species_250.sketch
java.lang.Exception
at dna.Data.findPath(Data.java:1244)
at sketch.Blacklist.refseqBlacklist(Blacklist.java:155)
at sketch.SendSketch.setFromAddress(SendSketch.java:284)
at sketch.SendSketch.<init>(SendSketch.java:200)
at sketch.SendSketch.main(SendSketch.java:50)
Loaded 1 sketch in 0.046 seconds.
Query: scaffolds.drop_bacteria.fasta DB: RefSeq SketchLen: 52788 Seqs: 738 Bases: 82339741 gSize: 55423446 File: scaffolds.drop_bacteria.fasta
WKID KID ANI Complt Contam Matches Unique TaxID gSize gSeqs taxName
2.81% 0.01% 87.88% 100.00% 0.26% 5 5 2856 188927 2 Cylindrotheca closterium
0.04% 0.00% 70.87% 4.72% 0.62% 3 2 75366 1135M 31277 Sinocyclocheilus grahami
0.01% 0.01% 67.31% 100.00% 0.26% 3 2 556484 25154K 88 Phaeodactylum tricornutum CCAP 1055/1
0.01% 0.01% 66.69% 64.57% 0.31% 4 0 121224 85172K 1882 Pediculus humanus corporis
0.01% 0.01% 66.79% 100.00% 0.26% 3 1 296543 30694K 64 Thalassiosira pseudonana CCMP1335
0.01% 0.00% 67.16% 31.41% 0.41% 3 1 6412 174854K 1991 Helobdella robusta
0.01% 0.00% 67.15% 31.71% 0.41% 3 0 743375 173229K 1483 Polistes dominula
Total Time: 0.780 seconds.
sketch.sh
andsendsketch.sh
are two different programs with different uses. Have you looked at the user guide:bbmap/docs/guides/BBSketchGuide.txt
sketch.sh
sendsketch.sh
I haven’t been able to find where that installs with Conda. What I was trying to show was that the sketch made by sketch.sh was different than the sketch created by sendsketch.sh with outsketch argument.
The document should be in
miniconda3/envs/bbmap/opt/bbmap-38.58-0/docs/guides/BBSketchGuide.txt
.Thanks, I found it for my conda environment. Does my question make sense tho?