Question

Vcf-consensus does not replace the variants of the VCF in the output Fasta file

0

Entering edit mode

8.1 years ago

evovaldo • 0

Hello, I'm trying to get a consensus fasta file using a Reference.fa and a Vcf.gz.

To do that I first do:

To zip the VCF

bgzip -c file.vcf > file.vcf.gz

To tabix the VCF

tabix -p vcf file.vcf.gz

After I Use:

cat file.fa | vcf-consensus vcf.gz > out.fa

However the out.fa does not have the variants from the VCF and is the same as the reference. I'm searching in the forum and do what the people has answered such as:

1) Make sure the *vcf.gz file was zipped using bgzip. You could unzip them first and zip them again with bgzip from tabix

2) Make sure the chromosome IDs are same in VCF and fasta. Sometimes, it's "chr01" in one file, but "chr1" or "1" in another one.

My fasta header is:

">19"

My VCF is:

CHROM POS ID REF ALT

19 45393826 rs568341751 G C
19 45393827 rs530756795 T C
19 45393837 rs375258934 G A
19 45393877 rs570535613 G A

What could be wrong?????

I appreciate any help.

Best regards,

sequence genome software error • 2.7k views

ADD COMMENT • link 8.1 years ago by evovaldo • 0

0

Entering edit mode

Just in case, there are not genotypes in your vcf ?

ADD REPLY • link 8.1 years ago by microfuge ★ 1.9k

score 0 · Answer 1 · 2016-10-18

This is part of the VCf file which I created using ensembl for a specific chr19 region. There is not genotypes on it.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG01191

19 45393826 rs568341751 G C 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=20113;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393827 rs530756795 T C 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=20026;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393837 rs375258934 G A 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=2;DP=19380;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393877 rs570535613 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=16770;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP GT 0|0

19 45393886 rs373618045 T C 100 PASS AA=.|||;AC=0;AF=0.000599042;AFR_AF=0;AMR_AF=0;AN=2;DP=16554;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0031;VT=SNP GT 0|0

19 45393958 rs558926735 C T 100 PASS AA=.|||;AC=0;AF=0.00179712;AFR_AF=0;AMR_AF=0;AN=2;DP=13554;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0092;VT=SNP GT 0|0

19 45393992 rs184809057 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=14726;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394006 rs535065384 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=16773;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394085 rs149731538 T G 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=2;DP=17596;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394111 rs575818345 C G 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=16243;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394125 rs544801238 G T 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=15936;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394126 rs116973259 G C 100 PASS AA=.|||;AC=0;AF=0.00159744;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=15888;EAS_AF=0.0069;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394159 rs190710529 C A 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0;AMR_AF=0;AN=2;DP=14841;EAS_AF=0.002;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

score 0 · Answer 2 · 2016-10-18

0

Entering edit mode

8.1 years ago

evovaldo • 0

I think was wrong. There is a genotype GT 0|0 for all variants.

ADD COMMENT • link 8.1 years ago by evovaldo • 0

score 0 · Answer 3 · 2016-10-18

0

Entering edit mode

8.1 years ago

evovaldo • 0

Tanks to you.....I'll try to do that.

ADD COMMENT • link 8.1 years ago by evovaldo • 0

score 0 · Answer 4 · 2016-10-19

0

Entering edit mode

8.1 years ago

evovaldo • 0

Hi microfuge, I tried to replace randomly some of the genotypes with 0/1 or 1/1 because I don't have GT:AD:DP:GQ:PL information probably because I obtained the vcf from the 1000 genomes data slice. However, no replacement is observed.

ADD COMMENT • link 8.1 years ago by evovaldo • 0

0

Entering edit mode

So sorry that it did not work. If not too much time is wasted then can you try giving a -s <sample_name> which in your case should be -s HG01191. Also a vcf-validator run on your vcf. vcf-validator input.vcf.gz could indicate potential problems with the VCF. On an unrelated note, you seem to using 'add answer' rather than 'add comment' while replying.

ADD REPLY • link 8.1 years ago by microfuge ★ 1.9k