Hello, I am currently using GATK's tool haplotypecaller to do variant discovery for some RNA-seq data. The is a very long running process so I have been looking at how to ways to optimize speed. It is mentioned that you can pass an interval list to HaplotypeCaller to speed up performance. It mentions that you can pass a vcf as the interval list to use. I am wondering if it is appropriate to use a reference vcf such as Ensembl's GRCH38.vcf file, as this will be intervals for genes and my variant discovery will only be looking within genes since that is the nature of the RNA-seq data.
Does make sense to use? I cant find much in the docs about what kind of interval list to use for RNA-seq data, it is mostly about whole genome or targeted exome. If that vcf is not proper to use, what interval list or how do I create an interval list to be used by HaplotypeCaller to speed up processing for this RNAseq data
Yes, you can specify a interval list, just use the
-L
option and point to a BED file containing your intervals!