Question

Funcotator gnomAD incoherent number of output fields

0

Entering edit mode

7 months ago

jaime alvarez ▴ 30

Hello

Funcotator germline with gnomAD is producing incoherent number of output fields. I am showing an example for a variant at chr1:598934

GATK version used: 4.5.0.0

I have called germline variants and performed Variant Quality Score Recalibration using GATK as por best practices, I then filter only allowing PASS variants and indexing it with:

bcftools view \
  -f 'PASS,.' \
  -O z \
  --threads 8 \
  -o output_filtered.vcf.gz INDEL_SNPS.applyModel.vcf.gz\
  && tabix -p vcf output_filtered.vcf.gz

Generating a file output_filtered.vcf.gz containing for example this line:

chr1    598934  .       CGGG    CG,C,*  374.02  PASS    AC=2,3,5;AF=0.083,0.125,0.208;AN=24;BaseQRankSum=0.674;DP=45;ExcessHet=0.0031;FS=0;InbreedingCoeff=0.3671;MLEAC=3,6,9;MLEAF=0.125,0.25,0.375;MQ=48.38;MQRankSum=-1.96;NEGATIVE_TRAIN_SITE;QD=17;ReadPosRankSum=-0.126;SOR=0.724;VQSLOD=-4.486;culprit=DP       GT:AD:DP:GQ:PGT:PID:PL:PS       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       0/0:5,0,0,0:5:9:.:.:0,9,135,9,135,135,9,135,135,135:.        ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       0/0:1,0,0,0:1:3:.:.:0,3,11,3,11,11,3,11,11,11:.      2|2:0,0,2,0:2:6:1|1:598934_CGGG_C:90,90,90,6,6,0,90,90,6,90:598934   ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.    ./.:1,0,0,0:1:0:.:.:0,0,0,0,0,0,0,0,0,0:.       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.   ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.        ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       0/0:3,0,0,0:3:6:.:.:0,6,90,6,90,90,6,90,90,90:.      ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       0/0:4,0,0,0:4:0:.:.:0,0,29,0,29,29,0,29,29,29:.      ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       3|3:0,0,0,2:2:6:1|1:598932_GGC_G:70,70,70,70,70,70,6,6,6,0:598932    1|1:0,4,0,0:4:12:1|1:598934_CGG_C:180,12,0,180,12,180,180,12,180,180:598934  0/0:3,0,0,0:3:3:.:.:0,3,45,3,45,45,3,45,45,45:. 0|2:3,0,3,0:6:99:0|1:598915_C_T:117,126,252,0,126,117,126,252,126,252:598915 0/0:5,0,0,0:5:12:.:.:0,12,180,12,180,180,12,180,180,180:.    0|3:2,0,0,4:6:15:0|1:598932_GGC_G:115,120,147,120,147,147,0,27,27,15:598932 3|3:0,0,0,2:2:6:1|1:598932_GGC_G:90,90,90,90,90,90,6,6,6,0:598932        ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.    ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.    ./.:1,0,0,0:1:0:.:.:0,0,0,0,0,0,0,0,0,0:.       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.   ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.        ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0:.

I use FUNCOTATOR germline (enabling gnomAD) data sources as specified here (https://gatk.broadinstitute.org/hc/en-us/articles/360035889931-Funcotator-Information-and-Tutorial#1.1.3)

I annotate it using FUNCOTATOR and index the file using this command:

gatk --java-options "-Xmx30g -XX:ParallelGCThreads=8" Funcotator \
  --variant output_filtered.vcf.gz \
  -R /data/genome/hg38ucsc/hg38_no_alt.fa \
  --ref-version hg38 \
  --data-sources-path /data/genome/hg38ucsc/annotations/GATK_Funcotator_db/germline/funcotator_dataSources.v1.8.hg38.20230908g/ \
  -O final_annotated.vcf.gz \
  --output-file-format VCF \
&& tabix -p vcf final_annotated.vcf.gz

Which gives me the following line for the variant:

chr1    598934  .       CGGG    CG,C,*  374.02  PASS    AC=2,3,5;AF=0.083,0.125,0.208;AN=24;BaseQRankSum=0.674;DP=45;ExcessHet=0.0031;FS=0;FUNCOTATION=[ENSG00000230021|hg38|chr1|598935|598937|RNA||DEL|GG|GG|-|g.chr1:598935_598937delGG|ENST00000634833.2|-|||c.e3-2464CCCG>CG|||0.5781637717121588|AGAACACGGCGGGGGGGGGGGCG|ENSG00000230021_ENST00000634337.2_RNA||||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||1.77576e-01|2.34347e-01|2.30930e-01|2.36943e-01|2.06667e-01|2.04082e-01|2.09150e-01|1.06742e-01|6.25000e-02|1.16438e-01|2.62480e-01|2.75463e-01|2.55556e-01|1.74644e-01|1.35786e-01|1.40000e-01|1.31179e-01|1.79873e-01|1.46222e-01|1.51056e-01|1.44244e-01|1.47727e-01|1.44092e-01|1.45833e-01|0.00000e+00|1.38037e-01|1.31579e-01|1.45161e-01|2.62480e-01|1.62877e-01||1|534314|false|false|rs568786854|]#[|||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||1.77576e-01|2.34347e-01|2.30930e-01|2.36943e-01|2.06667e-01|2.04082e-01|2.09150e-01|1.06742e-01|6.25000e-02|1.16438e-01|2.62480e-01|2.75463e-01|2.55556e-01|1.74644e-01|1.35786e-01|1.40000e-01|1.31179e-01|1.79873e-01|1.46222e-01|1.51056e-01|1.44244e-01|1.47727e-01|1.44092e-01|1.45833e-01|0.00000e+00|1.38037e-01|1.31579e-01|1.45161e-01|2.62480e-01|1.62877e-01||1|534314|false|false|rs568786854|],[ENSG00000230021|hg38|chr1|598935|598937|RNA||DEL|GGG|GGG|-|g.chr1:598935_598937delGGG|ENST00000634833.2|-|||c.e3-2464CCCG>G|||0.5781637717121588|AGAACACGGCGGGGGGGGGGGCG|ENSG00000230021_ENST00000634337.2_RNA||||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||2.84588e-01|4.09338e-01|4.08215e-01|4.10191e-01|2.73333e-01|2.61905e-01|2.84314e-01|1.96629e-01|1.25000e-01|2.12329e-01|2.55233e-01|2.40741e-01|2.62963e-01|2.82344e-01|2.48865e-01|2.43478e-01|2.54753e-01|2.86346e-01|2.30391e-01|2.43662e-01|2.32354e-01|2.28896e-01|2.27866e-01|2.13141e-01|0.00000e+00|2.50000e-01|2.36842e-01|2.64516e-01|4.09338e-01|2.63374e-01||1|534314|false|false|rs568786854|]#[|||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||2.84588e-01|4.09338e-01|4.08215e-01|4.10191e-01|2.73333e-01|2.61905e-01|2.84314e-01|1.96629e-01|1.25000e-01|2.12329e-01|2.55233e-01|2.40741e-01|2.62963e-01|2.82344e-01|2.48865e-01|2.43478e-01|2.54753e-01|2.86346e-01|2.30391e-01|2.43662e-01|2.32354e-01|2.28896e-01|2.27866e-01|2.13141e-01|0.00000e+00|2.50000e-01|2.36842e-01|2.64516e-01|4.09338e-01|2.63374e-01||1|534314|false|false|rs568786854|],[|||||||||||||||||||||||||||false||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||false|false||]#[ENSG00000230021|hg38|chr1|598934|598937|COULD_NOT_DETERMINE||NA|CGGG|CGGG|*||ENST00000634337.2|+||||||0.5781637717121588|AAGAACACGGCGGGGGGGGGGGCG|ENSG00000230021_ENST00000634833.2_COULD_NOT_DETERMINE||||||||||||||||||||||||||||false||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||false|false||];InbreedingCoeff=0.3671;MLEAC=3,6,9;MLEAF=0.125,0.25,0.375;MQ=48.38;MQRankSum=-1.96;NEGATIVE_TRAIN_SITE;QD=17;ReadPosRankSum=-0.126;SOR=0.724;VQSLOD=-4.486;culprit=DP      GT:AD:DP:GQ:PGT:PID:PL:PS       ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      0/0:5,0,0,0:5:9:.:.:0,9,135,9,135,135,9,135,135,135     ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      0/0:1,0,0,0:1:3:.:.:0,3,11,3,11,11,3,11,11,11   2|2:0,0,2,0:2:6:1|1:598934_CGGG_C:90,90,90,6,6,0,90,90,6,90:598934   ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      ./.:1,0,0,0:1:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 0/0:3,0,0,0:3:6:.:.:0,6,90,6,90,90,6,90,90,90        ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 0/0:4,0,0,0:4:0:.:.:0,0,29,0,29,29,0,29,29,29        ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 3|3:0,0,0,2:2:6:1|1:598932_GGC_G:70,70,70,70,70,70,6,6,6,0:598932    1|1:0,4,0,0:4:12:1|1:598934_CGG_C:180,12,0,180,12,180,180,12,180,180:598934  0/0:3,0,0,0:3:3:.:.:0,3,45,3,45,45,3,45,45,45   0|2:3,0,3,0:6:99:0|1:598915_C_T:117,126,252,0,126,117,126,252,126,252:598915 0/0:5,0,0,0:5:12:.:.:0,12,180,12,180,180,12,180,180,180 0|3:2,0,0,4:6:15:0|1:598932_GGC_G:115,120,147,120,147,147,0,27,27,15:598932  3|3:0,0,0,2:2:6:1|1:598932_GGC_G:90,90,90,90,90,90,6,6,6,0:598932    ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:1,0,0,0:1:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0 ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0      ./.:0,0,0,0:0:0:.:.:0,0,0,0,0,0,0,0,0,0

When I convert it to table using:

gatk --java-options "-Xmx30g -XX:ParallelGCThreads=8" VariantsToTable \
  -R /data/genome/hg38ucsc/hg38_no_alt.fa \
  -V final_annotated.vcf.gz \
  -F CHROM \
  -F POS \
  -F ID \
  -F TYPE \
  -F REF \
  -F ALT \
  -GF GT \
  -GF AD \
  -GF DP \
  -GF AC \
  -GF AN \
  -GF AF \
  -F FILTER \
  -F FUNCOTATION \
  -O final_table.tsv

The annotation fields, I look them up in the previous step result (Funcotator) in the header of the file final_annotated.vcf.gz after Funcotation fields are:

##INFO=<ID=FUNCOTATION,Number=A,Type=String,Description="Functional annotation from the Funcotator tool.  Funcotation fields are: Gencode_43_hugoSymbol|Gencode_43_ncbiBuild|Gencode_43_chromosome|Gencode_43_start|Gencode_43_end|Gencode_43_variantClassification|Gencode_43_secondaryVariantClassification|Gencode_43_variantType|Gencode_43_refAllele|Gencode_43_tumorSeqAllele1|...

I count the number of "|" separated fields in this part of the header, totalling 129 fields.

So when I look into the FUNCOTATION column in my variants table (final_table.tsv) I should see for each line, integer multiples of 129 which I can map to the annotation field descriptors. However, some lines do not obey this and I have no way of mapping the values to the annotation field descriptors. For example, line 153 (just showing the FUNCOTATION column:

[ENSG00000230021|hg38|chr1|598935|598937|RNA||DEL|GG|GG|-|g.chr1:598935_598937delGG|ENST00000634833.2|-|||c.e3-2464CCCG>CG|||0.5781637717121588|AGAACACGGCGGGGGGGGGGGCG|ENSG00000230021_ENST00000634337.2_RNA||||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||1.77576e-01|2.34347e-01|2.30930e-01|2.36943e-01|2.06667e-01|2.04082e-01|2.09150e-01|1.06742e-01|6.25000e-02|1.16438e-01|2.62480e-01|2.75463e-01|2.55556e-01|1.74644e-01|1.35786e-01|1.40000e-01|1.31179e-01|1.79873e-01|1.46222e-01|1.51056e-01|1.44244e-01|1.47727e-01|1.44092e-01|1.45833e-01|0.00000e+00|1.38037e-01|1.31579e-01|1.45161e-01|2.62480e-01|1.62877e-01||1|534314|false|false|rs568786854|]#[|||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||1.77576e-01|2.34347e-01|2.30930e-01|2.36943e-01|2.06667e-01|2.04082e-01|2.09150e-01|1.06742e-01|6.25000e-02|1.16438e-01|2.62480e-01|2.75463e-01|2.55556e-01|1.74644e-01|1.35786e-01|1.40000e-01|1.31179e-01|1.79873e-01|1.46222e-01|1.51056e-01|1.44244e-01|1.47727e-01|1.44092e-01|1.45833e-01|0.00000e+00|1.38037e-01|1.31579e-01|1.45161e-01|2.62480e-01|1.62877e-01||1|534314|false|false|rs568786854|],[ENSG00000230021|hg38|chr1|598935|598937|RNA||DEL|GGG|GGG|-|g.chr1:598935_598937delGGG|ENST00000634833.2|-|||c.e3-2464CCCG>G|||0.5781637717121588|AGAACACGGCGGGGGGGGGGGCG|ENSG00000230021_ENST00000634337.2_RNA||||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||2.84588e-01|4.09338e-01|4.08215e-01|4.10191e-01|2.73333e-01|2.61905e-01|2.84314e-01|1.96629e-01|1.25000e-01|2.12329e-01|2.55233e-01|2.40741e-01|2.62963e-01|2.82344e-01|2.48865e-01|2.43478e-01|2.54753e-01|2.86346e-01|2.30391e-01|2.43662e-01|2.32354e-01|2.28896e-01|2.27866e-01|2.13141e-01|0.00000e+00|2.50000e-01|2.36842e-01|2.64516e-01|4.09338e-01|2.63374e-01||1|534314|false|false|rs568786854|]#[|||||||||||||||||||||||||||false|||||||||||||||||||||||||||||||||||||||||||2.84588e-01|4.09338e-01|4.08215e-01|4.10191e-01|2.73333e-01|2.61905e-01|2.84314e-01|1.96629e-01|1.25000e-01|2.12329e-01|2.55233e-01|2.40741e-01|2.62963e-01|2.82344e-01|2.48865e-01|2.43478e-01|2.54753e-01|2.86346e-01|2.30391e-01|2.43662e-01|2.32354e-01|2.28896e-01|2.27866e-01|2.13141e-01|0.00000e+00|2.50000e-01|2.36842e-01|2.64516e-01|4.09338e-01|2.63374e-01||1|534314|false|false|rs568786854|],[|||||||||||||||||||||||||||false||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||false|false||]#[ENSG00000230021|hg38|chr1|598934|598937|COULD_NOT_DETERMINE||NA|CGGG|CGGG|*||ENST00000634337.2|+||||||0.5781637717121588|AAGAACACGGCGGGGGGGGGGGCG|ENSG00000230021_ENST00000634833.2_COULD_NOT_DETERMINE||||||||||||||||||||||||||||false||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||false|false||]

The structure is:

[]#[],[]#[],[]#[]

So first, this seems to also use the separator "#" as well as "," but the problem is that the number of fields in each [] is:

[129]#[107],[129]#[107],[107]#[129]

How can I map the values to the annotation field descriptors??

gatk gnomAD Funcotator • 242 views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 7 months ago by jaime alvarez ▴ 30