Question

Is Casava Multithreaded?

1

Entering edit mode

12.6 years ago

Misha ▴ 20

I built a copy of CASAVA from source.

I am using CASAVA to perform a BCL to FASTQ conversion on a single lane (my favorite lane 4).

I am launching the program with a "make -j 32" on my CPU. Where I want 32 threads with 1 per each of my server's fancy cpu cores.

When I take a look at my CPU utilization I see that the demuplex command only uses one CPU thread and my total cpu utilization is a petty 3%.

Can somebody tell me if this step in CASAVA is parallel?
From the user guide:

NOTE the -j <n> command line option is supported to indicate up to <n> processes in parallel. However, for Bcl conversion the maximum level of parallelization is 8.

Does this mean each lane can only have 1 thread?

What in CASAVA is parallel. My understanding is that CASAVA spends most of its time in BCL to FASTQ or are their other costly operation that CASAVA can perform?

==Answers to Questions==

I have a 16 core AMD cpu.
To check utilization I am using top and "ps -m", I see one demultiplexing process and ~3% CPU utilization (1/32?)
I am using the latest version of CASAVA 1.8.2

==Unaligned Folder==

-rw-------  1 sanger criemp 24845 2012-06-06 11:26 myOutput.out
-rw-------  1 sanger criemp   301 2012-06-06 11:18 node_name
drwxr-xr-x  5 sanger criemp  4096 2012-06-06 11:15 Basecall_Stats_C0806ACXX
drwx------  3 sanger criemp  4096 2012-06-06 11:15 Temp
-rw-r--r--  1 sanger criemp   773 2012-06-06 11:10 SampleSheet.mk
-rw-r--r--  1 sanger criemp 12528 2012-06-06 11:09 DemultiplexedBustardConfig.xml
-rw-r--r--  1 sanger criemp  4858 2012-06-06 11:08 Makefile
-rw-r--r--  1 sanger criemp  1897 2012-06-06 11:08 DemultiplexConfig.xml
drwxr-xr-x 10 sanger criemp  4096 2012-06-06 11:08 Project_C0806ACXX
-rw-r--r--  1 sanger criemp 25906 2012-06-06 11:08 support.txt
-rw-r--r--  1 sanger criemp   380 2012-06-05 16:04 CASAVA.sh

The configure command is:

configureBclToFastq.pl --input-dir XXX/Basecalls --output-dir XXX/Unaligned --force --ignore-missing-bcl --ignore-missing-stats --use-bases-mask=y51

==Thread Usage is==

Cpu(s):  3.1%us,  0.1%sy,  0.0%ni, 96.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:     32307M total,    10456M used,    21851M free,        0M buffers

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                        
27279 sanger   20   0 91672  43m 2892 R  100  0.1  18:44.37 demultiplexBcls

• 6.4k views

ADD COMMENT • link updated 12.6 years ago by Dan D 7.4k • written 12.6 years ago by Misha ▴ 20

0

Entering edit mode

Do you have a 8-core CPU? I think if you run more than number of CPU cores , that will not increase the speed. 32 threads also saturated your IOs.

ADD REPLY • link 12.6 years ago by jingtao09 ▴ 110

0

Entering edit mode

Couple of questions: -Are you running CASAVA 1.8.2? -How are you checking utilization--are you using top, dstat, or something else?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

Please see responses to questions

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

What happens when you run make -j 8?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

The demultiplexing step (the first one) is not parallel. I see only 3% cpu utilization.

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

Are you running CASAVA on the same machine that has the data, or is CASAVA accessing the basecall data over a network?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

Casava is on the same machine as the BCL files. Also I am using linux.

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

Can you paste or describe the contents of the "unaligned" folder in the flowcell data directory?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

Thank you, please see the edits atop

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

OK, two things: first, you're not specifying the "--use-bases-mask" parameter correctly. There should be two hyphens and no equals sign. The second thing is that you haven't specified a sample sheet location. Try correcting both of those and running it again. I'll stand by.

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

I appear to have missed a hyphen but the equal sign should work. --use-bases-mask=y50 results in an error as expected. Anyways, I do not have a sample sheet, and I edited the SampleSheet.mk by commenting the lines corresponding to the unused lanes.

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

OK, I think the lack of a sample sheet might be a problem. There's some information in there that CASAVA needs, especially an index, in order to do its thing. You can make a one-liner sample sheet for lane 4. If you're willing, give it a shot, and let me know how it goes. it would be helpful to paste the sample sheet info here if you decide to go that route.

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator C0806ACXX,4,H-X,human,,Cypress,Y,51,CB,mTest ... Fixed lane

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

OK, I'm a little concerned that you omitted an index, and that you specified lane 5 instead of your favorite lane 4, but what happened when you ran CASAVA and pointed it to that sample sheet?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

It runs but it only runs on a single CPU :-( Also some errors about image magic

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

Well, that sounds like some progress, albeit incremental. Can you post the full text of the error? And did you specify the correct index?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

Thanks for all your help, the real trouble here is that the program runs and makes some fastq files but it doesn't run in parallel.

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

on your latest CASAVA run, how many threads did you specify (-j argument)?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

I tried -j 32 and -j 8.

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

OK, and with both you saw that CASAVA was only running one process (ie not in parallel)?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

Exactly! It makes me rather sad :-(

ADD REPLY • link 12.6 years ago by Misha ▴ 20

0

Entering edit mode

Have you successfully run other processes in parallel using make -j ?

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

Yes, would you be able to show what top or process monitor says on your system?

ADD REPLY • link 12.6 years ago by Misha ▴ 20

1

Entering edit mode

OK, we recently moved our HiSeqs, so we're running a simple PhiX validation. My sample sheet just has one line for each lane, like so:

BC0RT0ACXX,1,PhiX1,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,2,PhiX2,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,3,PhiX3,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,4,PhiX4,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,5,PhiX5,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,6,PhiX6,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,7,PhiX7,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,8,PhiX8,,,Control,Y,SR100index,CRB,Control

When I run the commands: [PATHTOCASAVA]/configureBclToFastq.pl --input-dir [PATHTOFLOWCELLBASEFOLDER] --output-dir [PATHTOFLOWCELLBASEFOLDER]/unaligned --sample-sheet [PATHTOSAMPLESHEET] --fastq-cluster-count 0 --mismatches 1 --with-failed-reads nohup make -j 16

I see, in top, after about 20 seconds:

46198 root 20 0 35220 11m 1880 R 100.0 0.0 1:51.69 demultiplexBcls 46170 root 20 0 35252 11m 1880 R 100.0 0.0 2:08.96 demultiplexBcls 46192 root 20 0 35256 11m 1880 R 100.0 0.0 1:57.27 demultiplexBcls 46159 root 20 0 34996 11m 1880 R 99.6 0.0 2:07.40 demultiplexBcls 46148 root 20 0 34660 11m 1880 R 98.7 0.0 2:05.87 demultiplexBcls 46179 root 20 0 35252 12m 1880 R 98.7 0.0 2:11.79 demultiplexBcls 46195 root 20 0 35328 11m 1880 R 98.0 0.0 1:56.12 demultiplexBcls 46238 root 20 0 35220 11m 1880 R 94.0 0.0 1:51.12 demultiplexBcls 46176 root 20 0 35252 11m 1880 D 55.9 0.0 2:03.77 demultiplexBcls 46183 root 20 0 35328 11m 1880 D 55.2 0.0 1:58.87 demultiplexBcls 46141 root 20 0 34660 11m 1880 D 52.9 0.0 2:06.30 demultiplexBcls 46144 root 20 0 34988 11m 1880 R 27.6 0.0 2:11.31 demultiplexBcls 46166 root 20 0 34988 11m 1880 R 16.1 0.0 2:08.96 demultiplexBcls 46175 root 20 0 35172 11m 1880 D 8.9 0.0 2:14.39 demultiplexBcls

ADD REPLY • link 12.6 years ago by Dan D 7.4k

0

Entering edit mode

on my system, top will show one process for each thread. I will post an example tomorrow, as I'm going to start a CASAVA analysis.

ADD REPLY • link 12.6 years ago by Dan D 7.4k

score 1 · Answer 1 · 2012-06-06

To answer your question, yes, CASAVA runs in parallel in the BCL-> FASTQ conversion, demultiplexing, and alignment steps. This is independent of lanes. It's not immediately apparent where your run is going wrong, but it's definitely not performing as it typically should. Hopefully we can narrow it down--see the comments.