Pileup viewer for VCF v4.2
0
0
Entering edit mode
10.1 years ago
Lee Katz ★ 3.2k

Hey BioStars, sorry to flood you so much with my bcftools questions. Since I have merged my VCFs, it turns out that they are now in VCF version 4.2 format. I want to view the pooled VCF; however, IGV does not read v4.2. Is there a mechanism for viewing v4.2? In any pileup viewer?

Using vcf-convert to version 4.1 does not work because after I convert/bgzip/index to v4.1, I get an error

Error loading C:\Users\gzu2\Desktop\msa\out.4.1.vcf.gz: The provided VCF file is malformed at approximately line number 31: Duplicate allele added to VariantContext: G

Lines 30-32 are such:

#CHROM  POS ID  REF ALT QUAL  FILTER  INFO  FORMAT  lambda_virus.fasta.wgsim.fastq.gz-reference sample1.fastq.gz-reference  sample2.fastq.gz-reference  sample3.fastq.gz-reference  sample4.fastq.gz-reference
`NC001416  1 . G G 0 PASS  AC=0;ADP=0;AN=0;HET=0;HOM=0;NC=1;WT=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AC ./. ./.:.:0:2:.:0:.:.:.:.:.:.:.:.:0 ./.:.:1:1:.:0:.:.:.:.:.:.:.:.:0 ./.:.:1:3:.:0:.:.:.:.:.:.:.:.:0 ./.:.:2:2:.:0:.:.:.:.:.:.:.:.:0`
NC001416  2 . G G 0 PASS  AC=0;ADP=3;AN=0;HET=0;HOM=0;NC=1;WT=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AC ./. ./.:.:3:4:.:0:.:.:.:.:.:.:.:.:0 ./.:.:1:1:.:0:.:.:.:.:.:.:.:.:0 ./.:.:2:3:.:0:.:.:.:.:.:.:.:.:0 ./.:.:2:3:.:0:.:.:.:.:.:.:.:.:0
bcftools-merge vcf vcf4.2 IGV • 4.7k views
ADD COMMENT
1
Entering edit mode

Why don't you just filter out the rows where REF=ALT? That may well get rid of the problem.

ADD REPLY
0
Entering edit mode

Good idea! But now I am getting the same error pertaining to this line:

NC001416  186 . G C,G 0 PASS  AC=2,0;ADP=201;AN=10;HET=0;HOM=1;NC=0;WT=0  GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR:AC 1/1:255:201:204:4:197:98.01%:1.7394E-112:17:17:4:0:179:18:197,. 0/0:48:26:44:25:0:0%:1E0:24:0:11:14:0:0:.,0 0/0:38:27:39:24:1:4%:5E-1:26:29:11:13:1:0:.,1 0/0:33:19:41:17:0:0%:1E0:26:0:8:9:0:0:.,0 0/0:40:23:36:22:0:0%:1E0:30:0:8:14:0:0:.,0

Most or all of my lines will have at least two choices in the ALT field because it is a pooled VCF from bcftools merge.

ADD REPLY
0
Entering edit mode

So it complains whenever any of the ALT alleles = REF. That's pretty reasonable in my mind (if VCF4.2 allows that then I'd be curious what the reasoning was...it seems like a terrible idea). I'd be curious if the GT field ever uses the ALT=REF genotype (e.g., 0/2 for the line you showed). I would suspect not. In any case, removing examples of this shouldn't be too terrible to script.

ADD REPLY
0
Entering edit mode

If I remove any line with REF and ALT with something like G C,G, then it will remove every line. This is because it is a merged VCF file. So I guess a good question would be, can I view a merged VCF in any viewer?

ADD REPLY
0
Entering edit mode

I wouldn't remove the line, just remove the reference ALT allele. You'll need to ensure that it's the last ALT allele (otherwise, you'll change the sample genotypes).

ADD REPLY
0
Entering edit mode

I'm still learning the VCF format but I'll give that a try. Is this the kind of format you are saying to make it into? I would change C,G to C,.? Or to .,C? Or remove it altogether so that it is just C?

NC001416  186 . G C,. 0 PASS
ADD REPLY
0
Entering edit mode

I changed all the ALT fields so that they do not have commas and so that they are only a single nucleotide. I also made sure that the ALT nucleotide was different than the REF. And now I can import it to IGV! Woo!

However, all variants now are incorrectly attributed to the first genome. I think I see what you meant by having to change the genotype field which might be too much effort for what I want... is there any script out there to do what I want? Or is it easier than I think?

And the code, for posterity:

gunzip -c out.4.1.vcf.gz | perl -lane 'if(/^#/){print;next;} $REF=$F[3]; @ALT=split(/,/,$F[4]); for(@ALT){$F[4]=$ALT[0] if($F[4] ne $REF);} next if($F[3] eq $F[4]); print join("\t",@F);' > tmp.vcf

bgzip -f tmp.vcf && tabix tmp.vcf.gz
ADD REPLY
0
Entering edit mode

I don't know of a script/program to do it (I don't have time at the moment to quickly write it, or else I'd just do it). If you happen to end up writing one then please share it here. I imagine others would find it useful.

ADD REPLY
0
Entering edit mode

Fair enough! I am not sure when I'll be able to do it, but if anyone else does it before me though please post it here!

ADD REPLY
0
Entering edit mode

I think I script I wrote today should finally answer this question (and more!). I call it fix vcf and it is part of my Lyve-SET package. https://github.com/lskatz/lyve-SET/blob/master/scripts/set_fixVcf.pl

ADD REPLY
0
Entering edit mode

Maybe a better question is, can I load a pooled/merged VCF into any pileup viewer? What about VCF v4.2 files?

ADD REPLY

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6