Hello!
I ran gatk toolchain including CalculateContamination in galaxy on human exome sequencing data, and it worked fine. However when I try feeding it with murine data (and murine reference files), CalculateContamination gives me this contamination table:
sample contamination error
mouse1_tumor NaN 1.0
And being fed with this result, FilterMutectCalls says:
java.lang.IllegalArgumentException: log10p: Log10-probability must be 0 or less
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)
at org.broadinstitute.hellbender.utils.MathUtils.log10BinomialProbability(MathUtils.java:934)
I suspect there is a connection between these results.
I tried two vcf files - only for one mouse strain C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz and one for all mice mgp.v5.merged.snps_all.dbSNP142.vcf.gz . I processed them to be biallelic like this:
bcftools plugin fill-tags C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf -Oz -o with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz -- -t AF
bcftools view -Oz -m2 -M2 with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf > biallelic.with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz
/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk IndexFeatureFile -F biallelic.with_af.C57BL_6NJ.mgp.v5.snps.dbSNP142.vcf.gz
With one strain file GetPileupSummaries gives no data at all:
#<METADATA>SAMPLE=mouse1_normal
contig position ref_count alt_count other_alt_count allele_frequency
And with panmurine vcf file it gives about 19,538,774 lines:
#SAMPLE=mouse1_tumor
#<METADATA>SAMPLE=mouse1_tumor
contig position ref_count alt_count other_alt_count allele_frequency
1 3001278 1 0 0 0.111111
1 3001309 1 0 0 0.0277778
1 3001310 1 0 0 0.0277778
..........
........
............
However regardless of the vcf file the CalculateContamination contamination table contains only NaN and 1.0 . But segmentation tables are different - with strain specific file it contains no data, and with panmurine file it looks similar to normal result (but not quite):
#SAMPLE=mouse1_tumor
#<METADATA>SAMPLE=mouse1_tumor
contig start end minor_allele_fraction
11 3133205 120186319 0.11578047329359388
12 9024112 117728420 0.11578047329359388
13 5864694 119389285 0.14709163137980344
14 8368568 122944938 0.11578047329359388
15 4153960 103503606 0.11578047329359388
16 5009839 96161963 0.11578047329359388
17 5928381 94835879 0.11578047329359388
18 3015925 89073490 0.11578047329359388
19 3996832 61225661 0.11578047329359388
1 4611421 193194496 0.12379274849627589
2 4925592 181917652 0.11578047329359388
3 8936403 154765979 0.11578047329359388
4 3212345 156357539 0.1547816247108552
5 3236897 149625240 0.11578047329359388
6 4754265 149353876 0.11578047329359388
7 3841411 144450962 0.11578047329359388
X 7946841 151850725 0.11578047329359388
8 3621134 127592996 0.11578047329359388
9 7131733 123664738 0.11578047329359388
10 5035173 128723359 0.11578047329359388
and takes about 2.5 hours to execute, which however does not help getting right contamination and FilterMutectCalls results.
Murine commands are like this:
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'GetPileupSummaries' '-I' 'input.bam' '-V' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102/snp/biallelic.with_af.mgp.v5.merged.snps_all.dbSNP142.vcf.gz' '-L' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102/snp/biallelic.with_af.mgp.v5.merged.snps_all.dbSNP142.vcf.gz' -O '/home/transgeneprep/galaxy/database/objects/4/f/e/dataset_4fe4af55-a216-498d-bcc5-a26cbbd39d0d.dat' '--disable-sequence-dictionary-validation'
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'CalculateContamination' '-I' '/home/transgeneprep/galaxy/database/objects/0/4/f/dataset_04f08621-9adb-49c5-bec0-ecb21d3f87a2.dat' '-matched' '/home/transgeneprep/galaxy/database/objects/4/f/e/dataset_4fe4af55-a216-498d-bcc5-a26cbbd39d0d.dat' '--tumor-segmentation' '/home/transgeneprep/galaxy/database/objects/a/1/f/dataset_a1f596be-bd1c-4ee3-8a81-06ef6577e267.dat' '-O' '/home/transgeneprep/galaxy/database/objects/e/2/b/dataset_e2bdecb6-2c1b-454a-b81b-3d3f2d9d5269.dat'
'/home/transgeneprep/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'FilterMutectCalls' '-R' '/media/transgeneprep/DataStore/References/mouse_GRCm38_ensembl_102//bwa/Mus_musculus.GRCm38.dna.primary_assembly.fa' '-V' 'input.vcf' '--contamination-table' '/home/transgeneprep/galaxy/database/objects/e/2/b/dataset_e2bdecb6-2c1b-454a-b81b-3d3f2d9d5269.dat' '--tumor-segmentation' '/home/transgeneprep/galaxy/database/objects/a/1/f/dataset_a1f596be-bd1c-4ee3-8a81-06ef6577e267.dat' '--orientation-bias-artifact-priors' '/home/transgeneprep/galaxy/database/objects/0/7/f/dataset_07f1c9fd-bd59-4c69-8ff5-75aad7abf42a.dat' '-O' '/home/transgeneprep/galaxy/database/objects/3/0/5/dataset_305ecc93-51f2-4168-9488-303d05dd9731.dat'
Hominine commands are like this:
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'GetPileupSummaries' '-I' 'input.bam' '-V' '/home/transgen/galaxy/tools/melanoma_tools/genome/ALL.wgs.biallelic.exons.af.vcf.gz' '-L' '/home/transgen/galaxy/tools/melanoma_tools/genome/ALL.wgs.biallelic.exons.af.vcf.gz' -O '/home/transgen/galaxy/database/objects/c/9/c/dataset_c9c268ae-784b-469a-92f4-96bf2c0d8a5c.dat' '--disable-sequence-dictionary-validation'
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'CalculateContamination' '-I' '/home/transgen/galaxy/database/objects/c/1/d/dataset_c1d59b22-ab10-450c-adbf-2e26dcf59c29.dat' '-matched' '/home/transgen/galaxy/database/objects/c/9/c/dataset_c9c268ae-784b-469a-92f4-96bf2c0d8a5c.dat' '--tumor-segmentation' '/home/transgen/galaxy/database/objects/3/e/d/dataset_3edec44a-5833-45ca-a269-d00daf80a7b3.dat' '-O' '/home/transgen/galaxy/database/objects/4/1/0/dataset_41054645-14bf-460b-91a1-4a32196e1454.dat'
'/home/transgen/galaxy/tools/melanoma_tools/gatk-4.1.3.0/gatk' 'FilterMutectCalls' '-R' '/home/transgen/galaxy/tools/melanoma_tools/genome/hg38.analysisSet.fa' '-V' 'input.vcf' '--contamination-table' '/home/transgen/galaxy/database/objects/4/1/0/dataset_41054645-14bf-460b-91a1-4a32196e1454.dat' '--tumor-segmentation' '/home/transgen/galaxy/database/objects/3/e/d/dataset_3edec44a-5833-45ca-a269-d00daf80a7b3.dat' '--orientation-bias-artifact-priors' '/home/transgen/galaxy/database/objects/e/f/7/dataset_ef7eba53-add9-49e3-b95c-86a57e08b028.dat' '-O' '/home/transgen/galaxy/database/objects/2/7/c/dataset_27ce06ec-f75b-4d70-8f22-259e27ed881d.dat'
What am I doing wrong?
Thanks in advance.