Genotype file supporting gatk CalculateGenotypePosteriors
2
0
Entering edit mode
5.7 years ago
gwotto • 0

Dear all, I want to use GATK CalculateGenotypePosteriors with the -supporting argument, which requires a a supporting file with genotypes from a cohort, e.g. 1000 genomes. In the GATK documentation, a file 1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.gz is used, that is also in the GATK data bundle. Unfortunately, this file has a bug or is corrupted, so the program crashes. Other files on the 1000 genomes site are for assembly b37. Does anybody know of a data file that could be used for my purpose (hg38)? Thanks in advance!

WGS GATK Genotyping • 2.1k views
ADD COMMENT
0
Entering edit mode
5.7 years ago
gwotto • 0

Hi, I finally solved my problem, the above mentioned file, which I have from https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0 has a problem with the chromosome lengths in the vcf header. When I run CalculateGenotypePosteriors with -supporting 1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.gz it gives me an error:

A USER ERROR has occurred: Input files reference and features have incompatible contigs: Found contigs with the same name but different lengths: contig reference = chr15 / 101991189 contig features = chr15 / 90338345

Inspecton of the file reveals that chr15 length has been replace by chr16 length and that chr17 length has been used twice for chr16 and chr17 length

contig=<ID=chr15,assembly=GCF_000001405.26,length=90338345>

contig=<ID=chr16,assembly=GCF_000001405.26,length=83257441>

contig=<ID=chr17,assembly=GCF_000001405.26,length=83257441>

I replace the lengths of chr15 and chr16 with the correct ones (which I have from the vcf file of my aligned data), resulting in these lines

contig=<ID=chr15,assembly=GCF_000001405.26,length=101991189>

contig=<ID=chr16,assembly=GCF_000001405.26,length=90338345>

contig=<ID=chr17,assembly=GCF_000001405.26,length=83257441>

Now I can run CalculateGenotypePosteriors without problems.

ADD COMMENT
0
Entering edit mode
5.6 years ago
gwotto • 0

To add to that: What I did not realize was that the file 1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.gz only has variants until chr15, the rest is missing. So while the program can be run with the above modifications of the file, it would probably be affected by the missing data > chr15... So it is basically back to square one....

ADD COMMENT

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6