Hi all! I'm building my first pipeline for human exome variant calling, and I'm starting to learn the basic working principles of genome/exome data analysis.
Now, the HaplotypeCaller tool from GATK needs a .bed file with the regions the sequencing platform targeted. However, I am unsure of which of these files provided by the company (in this case Agilent) should I apply, and why: Regions? Padded? AllTracks? Covered?
What's the difference between these?
Color note: I encountered this issue for the first time when trying to obtain a per-base coverage. Then I realised that my reference genome (HG38) contains alternate haplotypes, random regs, onto which my bam aligned (bad mapping quality, more SNP's compared with the same gene in the canonical chr), and the Agilent .bed files do not contain targeted regions for those seqs. Wouldn't it be detrimental if I have reads aligned to coding regions in these seqs, which may contain variants? Or should I use my judgement and circunscribe myself to the canonical coverage/variant call?
Thanks for your help!
duplicate of Which of the 4 SureSelect Agilent BED files to use with GATK haplotype caller?
Just checked it, it was created some time ago, is still applicable? Additionally, there is no information concerning the question of the alt, random (non-canonical) seqs present in HG38 ref genome. Do you have some insight on this?
Pierre Lindenbaum any feedback please?