Vcf-consensus does not replace the variants of the VCF in the output Fasta file
4
0
Entering edit mode
8.1 years ago
evovaldo • 0

Hello, I'm trying to get a consensus fasta file using a Reference.fa and a Vcf.gz.

To do that I first do:

To zip the VCF

bgzip -c file.vcf > file.vcf.gz

To tabix the VCF

tabix -p vcf file.vcf.gz

After I Use:

cat file.fa | vcf-consensus vcf.gz > out.fa

However the out.fa does not have the variants from the VCF and is the same as the reference. I'm searching in the forum and do what the people has answered such as:

1) Make sure the *vcf.gz file was zipped using bgzip. You could unzip them first and zip them again with bgzip from tabix

2) Make sure the chromosome IDs are same in VCF and fasta. Sometimes, it's "chr01" in one file, but "chr1" or "1" in another one.

My fasta header is:

">19"

My VCF is:

CHROM POS ID REF ALT

19 45393826 rs568341751 G C
19 45393827 rs530756795 T C
19 45393837 rs375258934 G A
19 45393877 rs570535613 G A

What could be wrong?????

I appreciate any help.

Best regards,

sequence genome software error • 2.8k views
ADD COMMENT
0
Entering edit mode

Just in case, there are not genotypes in your vcf ?

ADD REPLY
0
Entering edit mode
8.1 years ago
evovaldo • 0

This is part of the VCf file which I created using ensembl for a specific chr19 region. There is not genotypes on it.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG01191

19 45393826 rs568341751 G C 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=20113;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393827 rs530756795 T C 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=20026;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393837 rs375258934 G A 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=2;DP=19380;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45393877 rs570535613 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=16770;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.001;VT=SNP GT 0|0

19 45393886 rs373618045 T C 100 PASS AA=.|||;AC=0;AF=0.000599042;AFR_AF=0;AMR_AF=0;AN=2;DP=16554;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0031;VT=SNP GT 0|0

19 45393958 rs558926735 C T 100 PASS AA=.|||;AC=0;AF=0.00179712;AFR_AF=0;AMR_AF=0;AN=2;DP=13554;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0.0092;VT=SNP GT 0|0

19 45393992 rs184809057 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0;AMR_AF=0;AN=2;DP=14726;EAS_AF=0.001;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394006 rs535065384 G A 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=16773;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394085 rs149731538 T G 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0.0015;AMR_AF=0;AN=2;DP=17596;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394111 rs575818345 C G 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=16243;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394125 rs544801238 G T 100 PASS AA=.|||;AC=0;AF=0.000199681;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=15936;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394126 rs116973259 G C 100 PASS AA=.|||;AC=0;AF=0.00159744;AFR_AF=0.0008;AMR_AF=0;AN=2;DP=15888;EAS_AF=0.0069;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

19 45394159 rs190710529 C A 100 PASS AA=.|||;AC=0;AF=0.000399361;AFR_AF=0;AMR_AF=0;AN=2;DP=14841;EAS_AF=0.002;EUR_AF=0;NS=2504;SAS_AF=0;VT=SNP GT 0|0

ADD COMMENT
0
Entering edit mode

Thanks! I vaguely remember that in the script the replacement of nucleotide of the reference needs the genotype. Probably as all the genotypes in your vcf seem to be 0|0 there is no replacement. Depending on what you want could you try replacing the 0/0 with a 1/1 or a 0/1 and see if the replacement happens ?

ADD REPLY
0
Entering edit mode
8.1 years ago
evovaldo • 0

I think was wrong. There is a genotype GT 0|0 for all variants.

ADD COMMENT
0
Entering edit mode
8.1 years ago
evovaldo • 0

Tanks to you.....I'll try to do that.

ADD COMMENT
0
Entering edit mode
8.1 years ago
evovaldo • 0

Hi microfuge, I tried to replace randomly some of the genotypes with 0/1 or 1/1 because I don't have GT:AD:DP:GQ:PL information probably because I obtained the vcf from the 1000 genomes data slice. However, no replacement is observed.

ADD COMMENT
0
Entering edit mode

So sorry that it did not work. If not too much time is wasted then can you try giving a -s <sample_name> which in your case should be -s HG01191. Also a vcf-validator run on your vcf. vcf-validator input.vcf.gz could indicate potential problems with the VCF. On an unrelated note, you seem to using 'add answer' rather than 'add comment' while replying.

ADD REPLY

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6