I built a copy of CASAVA from source.
I am using CASAVA to perform a BCL to FASTQ conversion on a single lane (my favorite lane 4).
I am launching the program with a "make -j 32" on my CPU. Where I want 32 threads with 1 per each of my server's fancy cpu cores.
When I take a look at my CPU utilization I see that the demuplex command only uses one CPU thread and my total cpu utilization is a petty 3%.
Can somebody tell me if this step in CASAVA is parallel?
From the user guide:
NOTE the -j <n> command line option is supported to indicate up to <n> processes in parallel. However, for Bcl conversion the maximum level of parallelization is 8.
Does this mean each lane can only have 1 thread?
- What in CASAVA is parallel. My understanding is that CASAVA spends most of its time in BCL to FASTQ or are their other costly operation that CASAVA can perform?
==Answers to Questions==
- I have a 16 core AMD cpu.
- To check utilization I am using top and "ps -m", I see one demultiplexing process and ~3% CPU utilization (1/32?)
- I am using the latest version of CASAVA 1.8.2
==Unaligned Folder==
-rw------- 1 sanger criemp 24845 2012-06-06 11:26 myOutput.out
-rw------- 1 sanger criemp 301 2012-06-06 11:18 node_name
drwxr-xr-x 5 sanger criemp 4096 2012-06-06 11:15 Basecall_Stats_C0806ACXX
drwx------ 3 sanger criemp 4096 2012-06-06 11:15 Temp
-rw-r--r-- 1 sanger criemp 773 2012-06-06 11:10 SampleSheet.mk
-rw-r--r-- 1 sanger criemp 12528 2012-06-06 11:09 DemultiplexedBustardConfig.xml
-rw-r--r-- 1 sanger criemp 4858 2012-06-06 11:08 Makefile
-rw-r--r-- 1 sanger criemp 1897 2012-06-06 11:08 DemultiplexConfig.xml
drwxr-xr-x 10 sanger criemp 4096 2012-06-06 11:08 Project_C0806ACXX
-rw-r--r-- 1 sanger criemp 25906 2012-06-06 11:08 support.txt
-rw-r--r-- 1 sanger criemp 380 2012-06-05 16:04 CASAVA.sh
The configure command is:
configureBclToFastq.pl --input-dir XXX/Basecalls --output-dir XXX/Unaligned --force --ignore-missing-bcl --ignore-missing-stats --use-bases-mask=y51
==Thread Usage is==
Cpu(s): 3.1%us, 0.1%sy, 0.0%ni, 96.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32307M total, 10456M used, 21851M free, 0M buffers
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27279 sanger 20 0 91672 43m 2892 R 100 0.1 18:44.37 demultiplexBcls
Do you have a 8-core CPU? I think if you run more than number of CPU cores , that will not increase the speed. 32 threads also saturated your IOs.
Couple of questions: -Are you running CASAVA 1.8.2? -How are you checking utilization--are you using top, dstat, or something else?
Please see responses to questions
What happens when you run make -j 8?
The demultiplexing step (the first one) is not parallel. I see only 3% cpu utilization.
Are you running CASAVA on the same machine that has the data, or is CASAVA accessing the basecall data over a network?
Casava is on the same machine as the BCL files. Also I am using linux.
Can you paste or describe the contents of the "unaligned" folder in the flowcell data directory?
Thank you, please see the edits atop
OK, two things: first, you're not specifying the "--use-bases-mask" parameter correctly. There should be two hyphens and no equals sign. The second thing is that you haven't specified a sample sheet location. Try correcting both of those and running it again. I'll stand by.
I appear to have missed a hyphen but the equal sign should work. --use-bases-mask=y50 results in an error as expected. Anyways, I do not have a sample sheet, and I edited the SampleSheet.mk by commenting the lines corresponding to the unused lanes.
OK, I think the lack of a sample sheet might be a problem. There's some information in there that CASAVA needs, especially an index, in order to do its thing. You can make a one-liner sample sheet for lane 4. If you're willing, give it a shot, and let me know how it goes. it would be helpful to paste the sample sheet info here if you decide to go that route.
FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator C0806ACXX,4,H-X,human,,Cypress,Y,51,CB,mTest ... Fixed lane
OK, I'm a little concerned that you omitted an index, and that you specified lane 5 instead of your favorite lane 4, but what happened when you ran CASAVA and pointed it to that sample sheet?
It runs but it only runs on a single CPU :-( Also some errors about image magic
Well, that sounds like some progress, albeit incremental. Can you post the full text of the error? And did you specify the correct index?
Thanks for all your help, the real trouble here is that the program runs and makes some fastq files but it doesn't run in parallel.
on your latest CASAVA run, how many threads did you specify (-j argument)?
I tried -j 32 and -j 8.
OK, and with both you saw that CASAVA was only running one process (ie not in parallel)?
Exactly! It makes me rather sad :-(
Have you successfully run other processes in parallel using make -j ?
Yes, would you be able to show what top or process monitor says on your system?
OK, we recently moved our HiSeqs, so we're running a simple PhiX validation. My sample sheet just has one line for each lane, like so:
BC0RT0ACXX,1,PhiX1,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,2,PhiX2,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,3,PhiX3,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,4,PhiX4,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,5,PhiX5,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,6,PhiX6,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,7,PhiX7,,,Control,Y,SR100index,CRB,Control BC0RT0ACXX,8,PhiX8,,,Control,Y,SR100index,CRB,Control
When I run the commands: [PATHTOCASAVA]/configureBclToFastq.pl --input-dir [PATHTOFLOWCELLBASEFOLDER] --output-dir [PATHTOFLOWCELLBASEFOLDER]/unaligned --sample-sheet [PATHTOSAMPLESHEET] --fastq-cluster-count 0 --mismatches 1 --with-failed-reads nohup make -j 16
I see, in top, after about 20 seconds:
46198 root 20 0 35220 11m 1880 R 100.0 0.0 1:51.69 demultiplexBcls 46170 root 20 0 35252 11m 1880 R 100.0 0.0 2:08.96 demultiplexBcls 46192 root 20 0 35256 11m 1880 R 100.0 0.0 1:57.27 demultiplexBcls 46159 root 20 0 34996 11m 1880 R 99.6 0.0 2:07.40 demultiplexBcls 46148 root 20 0 34660 11m 1880 R 98.7 0.0 2:05.87 demultiplexBcls 46179 root 20 0 35252 12m 1880 R 98.7 0.0 2:11.79 demultiplexBcls 46195 root 20 0 35328 11m 1880 R 98.0 0.0 1:56.12 demultiplexBcls 46238 root 20 0 35220 11m 1880 R 94.0 0.0 1:51.12 demultiplexBcls 46176 root 20 0 35252 11m 1880 D 55.9 0.0 2:03.77 demultiplexBcls 46183 root 20 0 35328 11m 1880 D 55.2 0.0 1:58.87 demultiplexBcls 46141 root 20 0 34660 11m 1880 D 52.9 0.0 2:06.30 demultiplexBcls 46144 root 20 0 34988 11m 1880 R 27.6 0.0 2:11.31 demultiplexBcls 46166 root 20 0 34988 11m 1880 R 16.1 0.0 2:08.96 demultiplexBcls 46175 root 20 0 35172 11m 1880 D 8.9 0.0 2:14.39 demultiplexBcls
on my system, top will show one process for each thread. I will post an example tomorrow, as I'm going to start a CASAVA analysis.