- I am performing exome analyis
- I did alingmnet with bwa-mem
- Reference GRCh38.p12 genecode
- I performed deduplication using picard
- I am using gatk4-4.1.4.1-0
Now I am in the step where I need to use Mutect2 for making my PON(panel of Normals) I downloded GATK Ref file somatic-hg38_af-only-gnomad.hg38.vcf.gz and the .tbi file
now when I am running Mutect2 it is sowing error
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found
I checked further and found
reference contigs = [NC_000001.11, NT_187361.1, NT_187362.1, NT_187363.1
and
features contigs = [chr1, chr2, chr3
This mean that I used different reference for alingmnet? if so which reference then I use for GATK? or is there any other way to make my bam file compatible? please help
Yes, it means the contig names in the alignment don't match those in the VCF resource. Can you check your alignment and ensure they have the right naming conventions? The NC_ names are NCBI reference IDs that should not exist in any proper reference file.
Yes I checked my reference file fasta and it have NC_000001.11 and similar contig where as when I checked the UCSC hg38.fa it have similar contig name like "chr1_KI270709v1_random". Now either I need to start with UCSC genome or I need to find the VCF which support UCSC genome. Any recomendation?
Please do not add an answer unless you're answering the top-level question. Use
Add Comment
/Add Reply
instead as appropriate.I am unable to understand what you're saying. If the two contig sets are different, you'll either need to edit your VCF using bcftools or awk/sed, or redo the entire process, from alignment to calling using a stable reference across the board. I'd recommend you use GRCh38.p13 from GENCODE and avoid all other "reference" resources
thanks for comment; it is resolved