HaplotypeCaller with --dbsnp does not populate ID column
1
1
Entering edit mode
10.1 years ago

I want to obtain a VCF file containing genotype calls and their scores for every rsID, whether or not a variant was called. I was planning to use the following steps:

  1. HaplotypeCaller -genotyping_mode DISCOVERY --output_mode EMIT_VARIANTS_ONLY --emitRefConfidence BP_RESOLUTION as shown above
  2. awk '{ if ( $3 != "." ) { print $0; } }' variants.vcf > variants.filtered.vcf
  3. GenotypeGVCFs --includeNonVariantSites

Using the most recent dbSNP download here: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/All.vcf, I ran this:

GATK -T HaplotypeCaller \
  --reference_sequence GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
  --input_file recalibrated.bam \
  --dbsnp current_dbsnp/All.vcf.gz \
  --genotyping_mode DISCOVERY \
  --output_mode EMIT_VARIANTS_ONLY \
  --emitRefConfidence BP_RESOLUTION \
  --out variants.vcf

However, the ID column only contains ".". How can I get HaplotypeCaller to populate the ID column with rsIDs? Also, is there a better way to get variant and non-variant genotype calls with HaplotypeCaller?

HaplotypeCaller next-gen variant-calling GATK • 4.3k views
ADD COMMENT
1
Entering edit mode
10.1 years ago
Jordan ★ 1.3k

It looks your reference genome from Ensembl (GRCh38) which used 1-based coordinate system. And the dbSNP file you have used is from NCBI, which uses 0-based coordinate system just like UCSC.

It might be because of that you are not able to find any variants belonging to dbSNP and the id's only show "." ?

ADD COMMENT
0
Entering edit mode

Thanks.

That's concerning, then. I thought VCF was always 1-based.

However, I don't think that's the issue, since, with BP_RESOLUTION, literally every position is called (chr1:1, chr1:2, chr1:3, ...).

ADD REPLY

Login before adding your answer.

Traffic: 2759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6