Hi everyone,
I am new to bioinformatics and I am struggling with GATK's somatic mutation variant calling pipeline.
I have completed most of the preprocessing steps: CreateSequenceDictionary, bwa index, bwa mem, and MarkDuplicatesSpark.
Yet, I've been struggling with a UserError on the BaseRecalibrator step.
For my known sites file, I have been using a C57/BL6 known sites vcf file I found on the Mouse Genome project website.
For the reference genome, I used the GRCm39 latest release.
My initial error with BaseRecalibrator was that my contigs were incompatible between reference and vcf file. I tried to solve this by using bcftools annotate --rename-chrs to alter the vcf files.
Yet, now I am getting a new error:
A USER ERROR has occurred: Input files reference and features have incompatible contigs: Found contigs with the same name but different lengths: contig reference = NC_000067.7 / 195154279 contig features = NC_000067.7 / 195471971.
At this point, I am not sure if I should just redo the analysis with an older version of the mouse reference genome, or if this error can be fixed. Any pointers?
I'm an idiot... I just checked, yes, the vcf file was for the GRCm38_68 from Sanger. That makes total sense. I think this was the issue. Thanks a lot!