VCFParseError: ploidy > 2 not supported
1
1
Entering edit mode
5.9 years ago
always_learning ★ 1.1k

I have few SNP with ploidy > 2 in my VCF(Human) generated by GATK with genotypes like "0/1/1". Any idea how I can remove them from my VCF file? Any tools that do that?

VCF GATK PLOIDY • 2.2k views
ADD COMMENT
0
Entering edit mode

Hello,

why have you run your variant calling with a parameter that produces ploidy>2? Can you show us please the complete command?

Do your really want to remove those sites from your vcf or should the genotype be fixed in some way?

fin swimmer

ADD REPLY
0
Entering edit mode

I can't run this again. This is merged VCF of around 6000 samples. I think It will fine to remove such sites from VCF.

ADD REPLY
0
Entering edit mode

Should the site get removed even only one sample have a ploidy>2 or should the genotype get set to unknown?

ADD REPLY
0
Entering edit mode

I think "genotype get set to unknown" will be a better approach. What do you think?

ADD REPLY
0
Entering edit mode

Try this sed command:

$ sed -r 's:\t[0-9]+/[0-9]+/[0-9]+[^:\t]*:\t./.:g' input.vcf > output.vcf

It will look for a tab followed by at least one number, followed by /, followed by at least one number followed by a / followed by at least one number, followed by anything else but a : or a tab and replace these pattern by a ./. preceded by a tab.

fin swimmer

ADD REPLY
1
Entering edit mode
5.7 years ago
tpoterba ▴ 50

This is a Hail (0.2) error message from VCF import.

While we don't support triploid calls yet, we did add an argument to import_vcf: filter='\t0/1/1' will remove all lines matching that regex before parsing.

If you want to keep these sites but mark these calls as no call, then you can use the find_replace argument as well: find_replace=('\t\d/\d/\d', './.')

ADD COMMENT

Login before adding your answer.

Traffic: 1420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6