Hi,
I'm running GATK version: 4.1.2.0
I'm trying to validate the performance of BwaSpark. The input ubam file size is 5.1 GB. It takes around 65 minutes for GATK's BwaSpark to complete which is nearly same as bwa-mem. Below is the command that I used to run BwaSpark. Is there any way to make BwaSpark run faster while running with spark-master as local or will the performance increase only while running on spark cluster? Please let me know if I had to modify or add any parameter in the below command.
time gatk BwaSpark --bwa-mem-index-image GRCh37.fasta.img --spark-master local[*] --bam-partition-size 4000000 --conf 'spark.executor.num=5' --conf 'spark.executor.cores=16' --conf 'spark.executor.memory=15G' --conf 'spark.driver.memory=30G' --conf 'spark.dynamicAllocation.enabled=true' -I unmapped_input.bam -O aligned.bam -R GRCh37.fasta 2> Log_file.log
Also, please let me know where can I find the complete list of --conf parameters for BwaSpark? I couldn't find these options in gatk BwaSpark --help
and I had to select them by referring various other forums