I would like to know if it's possible, or if not, to propose it, the option of running bwa alignment for N minutes from a list of input sequences, that would produce output up until the point where N minutes have been used for the computation of the alignment of the initial fraction of query sequences. For example:
bwa bwasw -s 10 -N 120 target.prefix query.fa
This command will process 10 sequences at a time in query.fa and produce their output alignments up until the point where about 120 minutes have been used in the calculation. So a fraction of the sequences at the beginning of query.fa will be processed if they take longer, and the rest of the sequences won't be processed.
It may be that this can already be achieved by clever piping in and out of the input data. Any ideas?
Edit: the doalarm solution doesn't seem to work for me, since it returns no sequences if it's stopped before finishing. See below:
I run it here with different $sec values, and from 190seconds onwards, it gives all results for the 556 query sequences:
~/src/doalarm/doalarm-0.1.7/doalarm $sec \
~/src/bwa/latest/bwa/bwa bwasw -z 1000 -s 10 \
reference_genome.fa.gz querysequences.fa > /tmp/ex.$sec.sam
0 /tmp/ex.100.sam
0 /tmp/ex.110.sam
0 /tmp/ex.120.sam
0 /tmp/ex.130.sam
0 /tmp/ex.140.sam
0 /tmp/ex.150.sam
0 /tmp/ex.160.sam
0 /tmp/ex.170.sam
0 /tmp/ex.180.sam
0 /tmp/ex.182.sam
0 /tmp/ex.184.sam
0 /tmp/ex.186.sam
142765 /tmp/ex.190.sam
142765 /tmp/ex.200.sam
I am confused,why did you put a bounty? did the alarm version by DK not work? Did you test it?
@Michael Dondrup: It doesn't seem to work for me, see edit.
I think what we have seen until now is that, bwa (unlike e.g. samtools) is not unix pipe (|) friendly in the sense that you cannot really chunk the processing.
avilella, how variable is the time needed to map a sequence? can you make an empirical table which tells you that, say, mapping x sequences takes less than y minutes 95% of the times?