I have some BAMs from whole exome sequencing.
I want to run GATK haplotype caller, which requires one bed file as input
SureSelect kit for the BAMs comes with 4 different .bed files:
*_Covered.bed
*_AllTracks.bed
*_Padded.bed
*_Regions.bed
Googling shows this question has been asked multiple times: What Agilent Interval Files (.Bed) Should I Use For Exome Variant Calling With Gatk?
I still don't know, but my gut instinct is to use the *_Padded.bed
file because according to agilent it shows:
"the genomic regions that you can expect to sequence when using the design for target enrichment. To determine these regions, the program extends the regions in the Covered BED file by 100 bp on each side."
Has anyone done this before and know the way?
Just my 2 cents: I'm using the
*_Padded
file to subset my VCF file. Be aware that the regions can overlap.I wonder if that even matters for my application, as far as I can tell the bed file is supplied as a argument to GATK Haplotype caller just to cut down on searching time by pointing to specific intervals. I hate making assumptions though i'll be on the lookout
As an aside, are you sure a BED file would even work? I recall running into an issue a few years ago where GATK needed an
interval_list
file, which was similar but not identical to the BED format.Well not anymore! I'll take a look thanks again. GATK is amazing resource, kind of complicated though.
I had the same issue - unfortunately these Bed files from this company have mismatched coordinates than the reference files online. Liftover is needed because these files are usually for the older version of the genome AKA hg38 vs 39 etc.
Do you mean 19? There is no 39
I think you shold use the *_Regions.bed file
curious Hi! Could I ask you what argument did you use in the HaplotypeCaller command to include de .bed file?