Entering edit mode
6.2 years ago
Shelle
▴
30
I am trying to use biobloom tool to categorize sequences of sample that I have. I have to use one command like below:
./biobloomcategorizer -e –p /output/prefix –f "filter1.bf filter2.bf filter3.bf" inputReads1_1.fq inputreads1_2.fq
Since I have so many files about 19000, I have to use bash scripting. The command I am using is like one-liner below. The fastq files and all .bf
files are in a same directory but when I am writing the script in this way, biobloomcategorizer is not working at all while there is no issue with the tool itself as I tried the command above for only a few files. Can anyone tell me how should I modify the script below to make the tool work for so many files that I have?
for i in *.bf; do biobloomcategorizer -e –p /output/prefix –f echo \"$i\" filename_1.fastq filename_2.fastq; done
Clarify in what way the tool
is not working at all
. Way you have your loop only one of the*.bf
file is going to be used each time. That is not what you have in your first example.It is giving me this error:
which i know biobloomcategorizer is working as i have tried not to write a for loop and just copying and pasting some of .bf files in a simple format of tool command and it doesn't give me the error above. The one-liner command i am trying to use looks fine but don't know why it gives me the error above.
How many
*.bf
files are there? What does the 19000 number refer to? How many fastq files do you have?I have tried to do it like below as well but the error is different and says "Argument is too long!" for the line starting with "biobloomcategorizer". I have only two files with fastq extension(_1.fastq _2.fastq) which is in a paired mode. And the number of .bf files is 19000.
I have not used this specific program but your original loop should work with one bloom filter file at one time. Are you supposed to use the program in this way? One filter at a time? Since you have 19000 of these files.
I have to use all 19000 of bloom filters at one time. Array format was the only way that came to my mind. I even tried to slice the array so that not to use 19000 of filters but a bunch like 2000 filters and still the error "usage of paired end mode" like mentioned in first response.
Array=(*.bf) biobloomcategorizer -e –p /output/prefix –f echo \"${Array[@]:1:2001}\" filename_1.fastq filename_2.fastq
Did you make these from 19000 complete RefSeq genomes (one for each) and are trying to use these files to categorize reads in your fastq data? If you need to pass all 19000 files at the same time to the program input then you don't need that loop.
NOTE: You may want to look at
kraken
(and tools in that category) instead to classify reads.How about this:
You may still run into line too long type error because of all file names you will have in one line.
same error as in my first response:
I am going to refer you back to my comment above: C: Categorize sequences with biobloom tool