Hi everyone, I'm digging into gatk best practice wdl to speed it up a little bit and would like to scatter BQSR into an even intervals. Currently default task for intervals generation split the genome into uneven parts (chromosome based intervals basically) https://github.com/broadinstitute/warp/blob/c6c8cbd947fe898310ad83072c1aeb757a72e182/tasks/broad/UnmappedBamToAlignedBam.wdl#L193:~:text=scatter%20(subgroup%20in%20CreateSequenceGroupingTSV.sequence_grouping)%20%7B
`chr1:1+
chr2:1+
chr3:1+
chr4:1+
chr5:1+
chr6:1+
chr7:1+
chr8:1+
chr9:1+
chr10:1+
chr11:1+
chr12:1+ chr13:1+
chr14:1+ chr15:1+
chr16:1+ chr17:1+
chr18:1+ chr19:1+ chr20:1+
chr21:1+ chr22:1+
chrX:1+ chrY:1+ chrM:1+ _and all unmapped here_ `
I'd like to change the task CreateSequenceGroupingTSV
so it would produce even interval with given length (let say all genome into 15 shard plus additional shard for unmapped sequences)
Is it a reasonable approach? I do not see any specific reasons to have uneven intervals, which eventually transform into different execution time between shards. For me it looks like I can optimize that step, but I assume that someone already thought about it and for some reasons did not implement it into gatk production version, which is suspicious. Therefore, I really appreciate a second opinion on that
Best, Eugene