GATK AnnotateVcfWithBamDepth returns zero DP for all variants in VCF
1
0
Entering edit mode
15 months ago
ClkElf ▴ 50

Dear all,

I am using GATK (v4.1.9.0) AnnotateVcfWithBamDepth to get the DP for all variants in ClinVar VCF in a retina RNA-seq BAM file. However, the tool returns zero depth for all variants in the VCF, even though I checked multiple variants in IGV and I saw that they are covered pretty well.

This is the code:

java -Xms128g -Xmx128g -jar gatk.jar AnnotateVcfWithBamDepth -V clinvar_20230717_PLP_IRD_withchr_adj.vcf -I Retina1.bam -O clinvar_adj_Retina1.vcf

Below, you can see the headers of input BAM, input vcf and output vcf files: BAM header (I didn't put chromosomes on purpose to not give too long information):

@PG ID:TopHat   VN:2.0.10   CL:/opt/software/ngs/bin/tophat -p 24 -G /home/ngsworkspace//references/RNASeq/gencode_hg19_v19_final.gtf -r 170 --mate-std-dev 50 --segment-length 25 --transcriptome-index /home/ngsworkspace//references/RNASeq/bowtie2/transIndex/hg19_gencode_v19_final_tophat2 --library-type fr-unstranded -N 5 --read-gap-length 2 --read-edit-dist 5 --max-insertion-length 3 --max-deletion-length 3 -o /home/ngsworkspace//RNASeqAnalysis/Project_HumanRetina_Final/Tuxedo_analysis/tophat2_RNA1 /home/ngsworkspace//references/RNASeq/bowtie2/genomeIndex/hg19_gencode_v19_final /home/ngsworkspace//RNASeqAnalysis/Project_HumanRetina_Final/trim/RNA1_R1_val_1.fq.gz /home/ngsworkspace//RNASeqAnalysis/Project_HumanRetina_Final/trim/RNA1_R2_val_2.fq.gz
@PG ID:samtools PN:samtools PP:TopHat   VN:1.12-2-gd515e1b  CL:samtools view -H Retina1.bam

ClinVar input vcf file header and the first variant - Here, I added chr notations otherwise tool cannot find chromosomes:

##fileformat=VCFv4.1
##fileDate=2023-07-17
##source=ClinVar
##reference=GRCh37
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant; multiple values are separated by a vertical bar">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant; multiple values are separated by a vertical bar">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance; multiple values are separated by a vertical bar">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    879375  950448  C   T   .   .   ALLELEID=929884;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.879375C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001587|nonsense;ORIGIN=1;RS=761448939

And this is the header and the first variant of output vcf:

##fileformat=VCFv4.2
##GATKCommandLine=<ID=AnnotateVcfWithBamDepth,CommandLine="AnnotateVcfWithBamDepth  --output /home/user/clinvar_adj_Retina1.vcf --variant /home/user/clinvar_20230717_PLP_IRD_withchr_adj.vcf --input /home/user/Retina1.bam  --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false",Version="4.1.9.0",Date="August 27, 2023 7:06:15 PM CEST">
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=BAM_DEPTH,Number=1,Type=Integer,Description="pooled bam depth">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant; multiple values are separated by a vertical bar">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant; multiple values are separated by a vertical bar">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance; multiple values are separated by a vertical bar">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
##fileDate=2023-07-17
##reference=GRCh37
##source=AnnotateVcfWithBamDepth
##source=ClinVar
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    879375  950448  C   T   .   .   ALLELEID=929884;BAM_DEPTH=0;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.879375C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001587|nonsense;ORIGIN=1;RS=761448939

Unfortunately, the BAM file was downloaded from a study and I can't have any further information about the actual data. It would be really great if there is something I can do on the code, VCF and/or BAM file to fix the issue.

Thank you very much in advance for your help.

GATK AnnotateVcfWithBamDepth • 1.2k views
ADD COMMENT
0
Entering edit mode

what is the output of

samtools view Retina1.bam 'chr1:879375-879375'

?

ADD REPLY
0
Entering edit mode

Dear Pierre,

Thank you very much for your reply. The output read count is 999, and you can see the output here

samtools view Retina1.bam 'chr1:879375-879375' | wc -l
ADD REPLY
0
Entering edit mode

also, do you use the same chromosome notation (chr1 vs 1 ) between the BAMs and the VCFs ?

ADD REPLY
0
Entering edit mode

Yes, the notations are the same, which is with chr. For that, I added "chr" to the VCF with awk

ADD REPLY
0
Entering edit mode

UPDATE: I added --disable-tool-default-read-filters true argument, and the tool returned all the read counts without any filter. I am still trying to figure out why the default filtering returns a zero read count. Now, I am looking into the output of samtools view Retina1.bam 'chr1:879375-879375', where the read count is 999. The full output is here.

Below, you can see the new VCF header and the variant mentioned above :

##fileformat=VCFv4.2
##GATKCommandLine=<ID=AnnotateVcfWithBamDepth,CommandLine="AnnotateVcfWithBamDepth --output /home/user/clinvar_adj_Retina1.vcf --variant /home/user/clinvar_20230717_PLP_IRD_withchr.vcf --input /home/user/Retina1.bam --disable-tool-default-read-filters true --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays ",Version="4.1.9.0",Date="August 28, 2023 12:09:22 PM CEST">
##ID=<Description="ClinVar Variation ID">
##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
##INFO=<ID=BAM_DEPTH,Number=1,Type=Integer,Description="pooled bam depth">
##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant; multiple values are separated by a vertical bar">
##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant; multiple values are separated by a vertical bar">
##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance; multiple values are separated by a vertical bar">
##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
##fileDate=2023-07-17
##reference=GRCh37
##source=AnnotateVcfWithBamDepth
##source=ClinVar
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    879375  950448  C   T   .   .   ALLELEID=929884;BAM_DEPTH=999;CLNDISDB=MedGen:CN517202;CLNDN=not_provided;CLNHGVS=NC_000001.10:g.879375C>T;CLNREVSTAT=criteria_provided,_single_submitter;CLNSIG=Pathogenic;CLNVC=single_nucleotide_variant;CLNVCSO=SO:0001483;GENEINFO=SAMD11:148398;MC=SO:0001587|nonsense;ORIGIN=1;RS=761448939

I would be still happy if anybody could figure it out.

All the best.

ADD REPLY
3
Entering edit mode
15 months ago

AnnotateVcfWithBamDepth https://gatk.broadinstitute.org/hc/en-us/articles/360040507771-AnnotateVcfWithBamDepth use the filter WellformedReadFilter : https://gatk.broadinstitute.org/hc/en-us/articles/360040095672-WellformedReadFilter This filters requires Read Group and Sequence are present while there is no read-group in your reads.

 wget -O - "https://raw.githubusercontent.com/ClkElf/Biostars_posts/main/Retina1-chr-879375-879375.txt"  | grep RG
ADD COMMENT
0
Entering edit mode

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6