Question

Categorize sequences with biobloom tool

0

Entering edit mode

6.2 years ago

Shelle ▴ 30

I am trying to use biobloom tool to categorize sequences of sample that I have. I have to use one command like below:

./biobloomcategorizer -e –p /output/prefix –f "filter1.bf filter2.bf filter3.bf" inputReads1_1.fq inputreads1_2.fq

Since I have so many files about 19000, I have to use bash scripting. The command I am using is like one-liner below. The fastq files and all .bf files are in a same directory but when I am writing the script in this way, biobloomcategorizer is not working at all while there is no issue with the tool itself as I tried the command above for only a few files. Can anyone tell me how should I modify the script below to make the tool work for so many files that I have?

for i in *.bf; do biobloomcategorizer -e –p /output/prefix –f  echo \"$i\"  filename_1.fastq  filename_2.fastq; done

sequence genome alignment software error • 1.9k views

ADD COMMENT • link updated 6.2 years ago by Biostar 20 • written 6.2 years ago by Shelle ▴ 30

0

Entering edit mode

biobloomcategorizer is not working at all

Clarify in what way the tool is not working at all. Way you have your loop only one of the *.bf file is going to be used each time. That is not what you have in your first example.

ADD REPLY • link 6.2 years ago by GenoMax 147k

0

Entering edit mode

It is giving me this error:

Usage of paired end mode:
BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2]
or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [PAIREDBAMSAM]

which i know biobloomcategorizer is working as i have tried not to write a for loop and just copying and pasting some of .bf files in a simple format of tool command and it doesn't give me the error above. The one-liner command i am trying to use looks fine but don't know why it gives me the error above.

ADD REPLY • link 6.2 years ago by Shelle ▴ 30

0

Entering edit mode

How many *.bf files are there? What does the 19000 number refer to? How many fastq files do you have?

ADD REPLY • link 6.2 years ago by GenoMax 147k

0

Entering edit mode

I have tried to do it like below as well but the error is different and says "Argument is too long!" for the line starting with "biobloomcategorizer". I have only two files with fastq extension(_1.fastq _2.fastq) which is in a paired mode. And the number of .bf files is 19000.

#! bin/bash
Array=(*.bf)
biobloomcategorizer -e –p /output/prefix –f  echo \"${Array[*]}\"  filename_1.fastq  filename_2.fastq

ADD REPLY • link 6.2 years ago by Shelle ▴ 30

0

Entering edit mode

I have not used this specific program but your original loop should work with one bloom filter file at one time. Are you supposed to use the program in this way? One filter at a time? Since you have 19000 of these files.

ADD REPLY • link 6.2 years ago by GenoMax 147k

0

Entering edit mode

I have to use all 19000 of bloom filters at one time. Array format was the only way that came to my mind. I even tried to slice the array so that not to use 19000 of filters but a bunch like 2000 filters and still the error "usage of paired end mode" like mentioned in first response.

Array=(*.bf) biobloomcategorizer -e –p /output/prefix –f echo \"${Array[@]:1:2001}\" filename_1.fastq filename_2.fastq

ADD REPLY • link 6.2 years ago by Shelle ▴ 30

0

Entering edit mode

Did you make these from 19000 complete RefSeq genomes (one for each) and are trying to use these files to categorize reads in your fastq data? If you need to pass all 19000 files at the same time to the program input then you don't need that loop.

NOTE: You may want to look at kraken (and tools in that category) instead to classify reads.

ADD REPLY • link 6.2 years ago by GenoMax 147k

0

Entering edit mode

How about this:

biobloomcategorizer -e –p /output/prefix –f  echo \"`ls -1 *.bf | tr '\n' ' '`\"  filename_1.fastq  filename_2.fastq

You may still run into line too long type error because of all file names you will have in one line.

ADD REPLY • link 6.2 years ago by GenoMax 147k

0

Entering edit mode

same error as in my first response:

Usage of paired end mode:
BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [FILEPAIR1] [FILEPAIR2]
or BioBloomCategorizer [OPTION]... -f "[FILTER1]..." [PAIREDBAMSAM]

ADD REPLY • link 6.2 years ago by Shelle ▴ 30

0

Entering edit mode

I am going to refer you back to my comment above: C: Categorize sequences with biobloom tool

ADD REPLY • link 6.2 years ago by GenoMax 147k