VariantRecalibrator Error message
1
1
Entering edit mode
10.4 years ago
cvu ▴ 180

Hi,

I'm using VariantRecalibrator from GATK. I've generated my vcf files with Mpileup/bcftools.

When I am using VariantRecalibrator, with this argument,

java -Xmx4g \
  -jar GenomeAnalysisTK-3.1-1/GenomeAnalysisTK.jar \
  -T VariantRecalibrator \
  -R GRch38.fasta \
  -input filtered_cano.vcf \
  -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 00-All.vcf \
  -an QD \
  -an HaplotypeScore \
  -an MQRankSum \
  -an ReadPosRankSum \
  -an FS \
  -an MQ \
  -mode BOTH \
  -recalFile cano.recal \
  -tranchesFile cano.tranches \
  -rscriptFile cano.plots.R

it is throwing this error message :

##### ERROR A USER ERROR has occurred (version 3.1-1-g07a4bf8):
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 10161196: unparsable vcf record with allele B

Please suggest me, if I am missing out something in arguments?

I also assume that, GATK doesn't take vcf file, which is generated from samtools.

Thanks!!!!

genome alignment snp Assembly next-gen • 4.3k views
ADD COMMENT
1
Entering edit mode

GATK can take VCF file. Perhaps vcf is the only format it accepts for the Recalibration. It clearly says that the error is with the VCF file and not the arguments.

##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 10161196: unparsable vcf record with allele B

Paste the line number 10161195,10161196,10161197 here.

ADD REPLY
0
Entering edit mode

Hi,

3    16902883    rs56708014    B    BGC    .    .    RS=56708014;RSPOS=16902887;dbSNPBuildID=129;SSR=0;SAO=0;VP=0x050000080005000002000200;WGT=1;VC=DIV;INT;ASP;OTHERKG

recognized the error. There was a "B" in REF and ALT column in my dbSNP vcf file.

Thanks

Is that possible if I only use dbSNP as resource or I need to give all three resources (hapmap, omni and dbsnp)?

ADD REPLY
0
Entering edit mode

only dbSNP will work too

ADD REPLY
0
Entering edit mode

I tried with dbSNP but it is asking for some training=true dataset!

ADD REPLY
1
Entering edit mode
10.2 years ago

GATK doesn't play nice with IUPAC codes, so you'll need to change the B to an N in the dbSNP VCF.

B denotes that it's either a C or G or T at that locus i.e. they've decided the allele is rarely, if ever, an A at that locus. IUPAC codes are definitely more information than just tagging anything ambiguous as an N.

ADD COMMENT

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6