BQSR scatter intervals
0
0
Entering edit mode
2.6 years ago
Eugene A ▴ 190

Hi everyone, I'm digging into gatk best practice wdl to speed it up a little bit and would like to scatter BQSR into an even intervals. Currently default task for intervals generation split the genome into uneven parts (chromosome based intervals basically) https://github.com/broadinstitute/warp/blob/c6c8cbd947fe898310ad83072c1aeb757a72e182/tasks/broad/UnmappedBamToAlignedBam.wdl#L193:~:text=scatter%20(subgroup%20in%20CreateSequenceGroupingTSV.sequence_grouping)%20%7B

`chr1:1+

chr2:1+

chr3:1+

chr4:1+

chr5:1+

chr6:1+

chr7:1+

chr8:1+

chr9:1+

chr10:1+

chr11:1+

chr12:1+ chr13:1+

chr14:1+ chr15:1+

chr16:1+ chr17:1+

chr18:1+ chr19:1+ chr20:1+

chr21:1+ chr22:1+

chrX:1+ chrY:1+ chrM:1+ _and all unmapped here_ `

I'd like to change the task CreateSequenceGroupingTSV so it would produce even interval with given length (let say all genome into 15 shard plus additional shard for unmapped sequences)

Is it a reasonable approach? I do not see any specific reasons to have uneven intervals, which eventually transform into different execution time between shards. For me it looks like I can optimize that step, but I assume that someone already thought about it and for some reasons did not implement it into gatk production version, which is suspicious. Therefore, I really appreciate a second opinion on that

Best, Eugene

gatk bqsr intervals scatter • 483 views
ADD COMMENT

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6