GATK BwaSpark parameter optimisation
0
0
Entering edit mode
4 months ago
Joshua ▴ 20

Hi,

I'm running GATK version: 4.1.2.0

I'm trying to validate the performance of BwaSpark. The input ubam file size is 5.1 GB. It takes around 65 minutes for GATK's BwaSpark to complete which is nearly same as bwa-mem. Below is the command that I used to run BwaSpark. Is there any way to make BwaSpark run faster while running with spark-master as local or will the performance increase only while running on spark cluster? Please let me know if I had to modify or add any parameter in the below command.

time gatk BwaSpark --bwa-mem-index-image GRCh37.fasta.img --spark-master local[*] --bam-partition-size 4000000 --conf 'spark.executor.num=5' --conf 'spark.executor.cores=16' --conf 'spark.executor.memory=15G' --conf 'spark.driver.memory=30G' --conf 'spark.dynamicAllocation.enabled=true' -I unmapped_input.bam -O aligned.bam -R GRCh37.fasta 2> Log_file.log

Also, please let me know where can I find the complete list of --conf parameters for BwaSpark? I couldn't find these options in gatk BwaSpark --help and I had to select them by referring various other forums

parallel-computing gatk bwaspark • 209 views
ADD COMMENT

Login before adding your answer.

Traffic: 2264 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6