Entering edit mode
16 months ago
Manuel Sokolov Ravasqueira
▴
110
Hi,
I am following the GATK best practices pipeline for variant calling starting from targeted sequencing bam and bai files using the hg19 reference. When applying GATK Mutect2 got the following error
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No
overlapping contigs found.
reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13,
chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5,
chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1,
chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random,
chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212,
chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random,
chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211,
chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221,
chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random,
chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230,
chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240,
chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238,
chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246,
chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247,
chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231,
chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random]
features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT,
GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1,
GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1,
GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1,
GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1,
GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1,
GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1,
GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1,
GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1,
GL000194.1, GL000225.1, GL000192.1, NC_007605]
And this is the code I have:
export GENOME="/PATH/Manuel/FILES/HUMAN_REFERENCES/hg19.fa"
export GERM="/PATH/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad.raw.sites.vcf"
export PON="/PATH/Manuel/FILES/HUMAN_REFERENCES/Mutect2-WGS-panel-b37.vcf"
export VCF="${RECALIBRATED%.bam}.vcf"
srun /mnt/beegfs/apptainer/images/gatk4.sif gatk Mutect2 \
-R $GENOME \
-I $RECALIBRATED \
--germline-resource $GERM \
--panel-of-normals $PON \
-O $VCF
Is there any better PON
or GERM
that I can use? And if I have to make the names the same from reference contigs and feature contigs how can I do that?
Best Regards,
Manuel
classical problem . You're using two different reference dictionaries : https://www.google.com/search?q=%22No+overlapping+contigs+found.%22+site%3Abiostars.org
I am sorry, which reference dictionaries are you referring to? The sample reference file has a .dict and a .fai file which are required this step (one generated with picard and other with samtools faidx)
Dictionaries define the chromosomes , their names , sizes , ordre. Here you have a mix of "1" and "chr1"
Your reference, indexes, aligned data have to refer to the same set of data. You can't mix and match.
Thank you for your responses. I do not have a matched normal for this patient thereby I am using "Mutect2-WGS-panel-b37.vcf" that is panel of normals associated with the reference hg19 and the germline is a list of known germline alterations also associated with hg19. They should match.. How can I change the names of reference contigs and feature contigs so that both are chr1 or 1, and the list size is the same removing some reference contigs?
VCF files: Change Chromosome Notation
Thank you. That should make the names the same and regarding the size of the two lists?