Question

process substitution input

0

Entering edit mode

9.2 years ago

gmdc • 0

dbgh5 does not accept input from pipe [-] or from <(cat )

dbgh5 -in <(zcat big_file.fastq.gz another_huge.fq.gz ...) ...
EXCEPTION: Empty bank

It works fine if the uncompressed or compressed input is given without the process substitution, one at a time or concatenating them before.

The problem is that the temporary file will become humongous if there are lots of huge files and takes time to have it.

Is there some reason to not support this ? It could be good to avoid using extra disk space in some cases.

Or is there something that I am missing here?

DBGH5 GATB dbgh5 • 2.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.2 years ago by gmdc • 0

0

Entering edit mode

Hello,

I think it's not possible to do so because the dbgh5 command actually reads the input file several times:

Computing statistics about part of the input file (statistics about minimizers distribution)
Reading the kmers from the input file and dispatching them in partitions (according to the minimizers distribution computed in step 1)

Since you can't rewind a pipe (see here), there is no way right now to use pipes with dbgh5.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.2 years ago by edrezen ▴ 730

0

Entering edit mode

I assume you are using DiscoSNP? Check this post: DiscoSNP++ 2.2.0 problem

Not that by some magic program authors implemented random access to gz files...

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.2 years ago by Darked89 4.7k

0

Entering edit mode

Here I talk about the dbgh5 command itself (DiscoSNP uses this command to build a de Bruijn graph from the input reads).

For memo, the -in parameter of dbgh5 can be one of the following:

a fasta file; ex: reads.fa
a gzipped fasta file; ex: reads.fa.gz
a list of fasta files (gzipped or not); ex: r1.fa,r2.fa.gz
a text file containing a list of files, one file per line (possibly another text file); ex:
- r1.fa
- r2.fa.gz
- fileofile.txt

However, a named pipe here should not work because of the several passes on the -in parameter (in other words, the pipe would be consumed during the first pass, giving nothing left to read for the other passes).

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.2 years ago by edrezen ▴ 730

Ram · Answer 1 · 2015-09-30

0

Entering edit mode

9.2 years ago

Darked89 4.7k

You can try to use named pipes:

http://www.linuxjournal.com/article/2156

You can try:

mkfifo pipe1
mkfifo pipe2
zcat file1.gz > pipe1 
zcat file2.gz > pipe2
dbgh5 -in pipe1 pipe2

Let us know if it works for you. But start with some toy-sized, fastq.gz files just to test it.

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 9.2 years ago by Darked89 4.7k