Pindel: long run-time and low CPU usage
0
0
Entering edit mode
2.6 years ago
Lillian • 0

I'm trying to run Pindel on some 30x Illumina WGS data. I aligned reads with BWA-MEM, then sorted by co-ordinates and indexed them with Samtools. I also tried filtering the bam files with samtools -F 0x800 as suggested by another post. I tested Pindel on 2 samples and on chr1 only with 48 cores and 48 threads, but it seems to take ages with very low CPU efficiency (13%).

pindel -T 48 -f ref.fa -i pindel_config_file_test.txt -e 0.03 -c chr1 -o test_chr1_2samples

In the Pindel paper they claimed it took 4.5 hours to analyse one WGS samples for all chromosomes on one core, which makes me think I've done something wrong.

WGS BWA indels NGS Pindel • 944 views
ADD COMMENT
1
Entering edit mode

What kind of machine is this and what is the filesystem, so is it like a harddrive or a NAS that multiple people use, or even a parallel high performance filesystem? Lots of I/O can slow things down.

ADD REPLY
0
Entering edit mode

It's a shared high performance computing facility - 1248x Intel Xeon Platinum 8260(Cascade Lake) @ 2.40GHz cores. My files are stored in the HPC scratch space. I'll try reducing the number of cores and see if that improves usage.

ADD REPLY
1
Entering edit mode

Ok. If that does not help then report this issue to the admin. We had a similar issue years back and it turned out that some of the GPFS fileservers were not working properly leading to poor I/O. Often the nodes have local harddrives attached , maybe you can copy your bam files there to get around the filesystem. On our system that works via Beeond/Beegfs. Check your docs or with the admin for that. But this is all hacky, the filesystem should work, so talk to the cluster admin first.

ADD REPLY
0
Entering edit mode

I don't use your program, so this is an educated guess rather than informed suggestion.

A cause for a bottleneck in a procedure may not always be obvious. Here you are seeing a low CPU usage, so that seems to be the problem. However, that is rarely the case with modern computers. If a program needs to read and write to disk a lot while calculating, and a disk is slow, your CPUs may have nothing to do while waiting for the data to come in or for the results to be recorded. In two words: slow disk.

You may want to monitor your disk activity with iotop.

ADD REPLY

Login before adding your answer.

Traffic: 1405 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6