Removing multi-variant records from vcf file
3
0
Entering edit mode
2.5 years ago
Emili • 0

I am using gatk ASEReadCounter to get the read counts per allele. To do so, I used the following command:

gatk ASEReadCounter  -R /path_to_genome/hg38_genome/GRCh38.p13.genome.fa  -I sample.sorted.bam
-V sample.vcf.gz  -O output.table

I used GATK4. but I realized In my VCF at position chr1:1574033, there are more than one variant record in the VCF. This is not accepted by ASEReadCounter. if it was only one, I could do it manually but the question is how can I remove those rows if many rows have more than one variant record. does GATK have such ability? the row in my vcf file looks like this:

chr1    1574033 .   AAG *,A 55.01   .   AC=1,1;AF=0.500,0.500;AN=2;DP=12;ExcessHet=3.0103;FS=0.000;MLEAC=1,1;MLEAF=0.500,0.500;MQ=60.00;QD=6.88;SOR=1.179   GT:AD:DP:GQ:PL  1/2:0,6,2:8:29:434,65,29,147,0,111
GATK vcf • 1.6k views
ADD COMMENT
0
Entering edit mode

I would suggest to normalize the VCF. This way multiallelic records will be flattened and each row would have only one allele. bcftools can do that.

ADD REPLY
0
Entering edit mode
2.5 years ago
drabiza1 ▴ 20
bcftools norm -m - sample.vcf.gz > normalized_sample.vcf
ADD COMMENT
0
Entering edit mode
2.5 years ago
JustinZhang ▴ 120

If you are handling with germline variants, pre.py in (github) Illumina/hap.py can help you normlize your vcf file almost perfectly.

If you are handling with somatic variants, try bcftools or other software. (not verified)

ADD COMMENT
0
Entering edit mode
18 months ago
geocarvalho ▴ 390

Another option is GATK SelectVariants and check the select-type-to-include and restrict-alleles-to parameters :

docker run -v $PWD:$PWD -w $PWD broadinstitute/gatk:4.4.0.0 gatk SelectVariants \
     -R GRCh38.genome.fa \
     -V ${SAMPLE}.vcf.gz \
     --restrict-alleles-to BIALLELIC \
     --select-type-to-include SNP \
     -O ${SAMPLE}.selectvariants.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 2397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6