The pruned in file by bcftools gives error "Wrong VCF header" while calculating kinship matrix using Rvtests
0
0
Entering edit mode
3.8 years ago
AVA ▴ 40

I want to calculate kinship matrix for autosomes in order to run a LMM. For this purpose, I first pruned each autosome using bcftools (version 1.11) and excluded the snps that had r2 greater than 0.15 by utilizing the command:

/bcftools +prune -m 0.15 -w 1000 input.vcf -Ob -o output.vcf

When I use the file output.vcf for calculating the kinship matrix using Rvtests utilizing the command:

/vcf2kinship --inVcf output.vcf --bn --out kinship_output

I get an error: 'Wrong VCF header'

However, if I use the above command with input.vcf (i.e., the original file without pruning from the first command), the same command runs with the Rvtest and I get a kinship matrix. I assume while pruning, I am ouputting the vcf file in a wrong format. I have tried reading the bcftools tutorial but I have no clue where I am going wrong.

software error • 1.7k views
ADD COMMENT
0
Entering edit mode

Can you post the .vcf header for output.vcf

ADD REPLY
0
Entering edit mode

When I open the file,I get a warning: may be a binary file. See it anyway?

When I use an extension vcf.gz in the output file while pruning. The file header looks like below:

BCF^B^B^A<b6>^A^@##fileformat=VCFv4.2 A lot of meta-data in between followed by the usual vcf header CHROM POS etc followed by rest of the chromosome file.

Again, I get the same error: 'Wrong VCF header'

ADD REPLY
0
Entering edit mode

The header is compressed so you need to use `bcftools view -h output.vcf'. Please edit the output into your original question.

ADD REPLY
0
Entering edit mode

Do you mean in this command?

/bcftools +prune -m 0.15 -w 1000 input.vcf -Ob -o output.vcf

Can you please elaborate?

ADD REPLY
0
Entering edit mode

No. We need to see what the header of your vcf file looks like before we can diagnose the error. So you need to run bcftools view -h output.vcf which prints out just the header of the file which doesn't work in /vcf2kinship. Then edit the header into your original question.

ADD REPLY
0
Entering edit mode

Using bcftools view -h output.vcf, the header has IDs of the participants as: number_string

ADD REPLY
0
Entering edit mode

Can you edit your question to include the entire output of bcftools view -h output.vcf.

ADD REPLY
0
Entering edit mode

Thank you for your assistance. I have edited the original question

ADD REPLY
0
Entering edit mode

Are you sure that's the whole header? It should look something like this:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##source=atlas
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AB,Number=1,Type=Float,Description="Allelic imbalance">
##FORMAT=<ID=AI,Number=1,Type=Float,Description="Binomial probability of allelic imbalance if Hz site">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Phred-scaled normalized genotype likelihoods">
##contig=<ID=1>
##contig=<ID=2>
##contig=<ID=3>
##contig=<ID=4>
##contig=<ID=5>
##contig=<ID=6>
##contig=<ID=7>
##contig=<ID=8>
##contig=<ID=9>
##contig=<ID=10>
##contig=<ID=11>
##contig=<ID=12>
##contig=<ID=13>
##contig=<ID=14>
##contig=<ID=15>
##contig=<ID=16>
##contig=<ID=17>
##contig=<ID=18>
##contig=<ID=19>
##contig=<ID=20>
##contig=<ID=21>
##contig=<ID=22>
##bcftools_viewVersion=1.9-207-g2299ab6+htslib-1.9-271-g6738132
##bcftools_viewCommand=view -h XN224_MaximumLikelihood.bgz.vcf.gz; Date=Tue Feb 23 18:13:25 2021
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  XN224
ADD REPLY
0
Entering edit mode

Sorry, I misunderstood you. I am quite new to working in genomics. Using bcftools view -h output.vcf, the header of a an output file for e.g. for chr10 appears as below. I am working on single chromosome imputed data.

fileformat=VCFv4.2

FILTER=<id=pass,description="all filters="" passed"="">

filedate=20190717

source="beagle.27Jan18.7e1.jar (version 4.1)"

FORMAT=<id=gt,number=1,type=string,description="genotype">

FORMAT=<id=ds,number=a,type=float,description="estimated alt="" dose="" [p(ra)="" +="" p(aa)]"="">

FORMAT=<id=gp,number=g,type=float,description="estimated genotype="" probability"="">

contig=<id=chr10>

Some more lines

INFO=<id=an,number=1,type=integer,description="total number="" of="" alleles="" in="" called="" genotypes"="">

INFO=<id=ac,number=a,type=integer,description="allele count="" in="" genotypes"="">

INFO=<id=ns,number=1,type=integer,description="number of="" samples="" with="" data"="">

INFO=<id=ac_hom,number=a,type=integer,description="allele counts="" in="" homozygous="" genotypes"="">

INFO=<id=ac_het,number=a,type=integer,description="allele counts="" in="" heterozygous="" genotypes"="">

INFO=<id=ac_hemi,number=a,type=integer,description="allele counts="" in="" hemizygous="" genotypes"="">

INFO=<id=af,number=a,type=float,description="allele frequency"="">

INFO=<id=maf,number=a,type=float,description="minor allele="" frequency"="">

INFO=<id=hwe,number=a,type=float,description="hwe test="" (pmid:15789306)"="">

INFO=<id=exchet,number=a,type=float,description="probability of="" excess="" heterozygosity"="">

bcftools_pluginCommand=plugin fill-tags -Ou; Date=Wed Jan 22 17:06:56 2020

INFO=<id=info,number=1,type=float,description="impute2 info="" score"="">

bcftools_pluginCommand=plugin impute-info -Oz -o Botnia_df2_R4_chr10.vcf.gz; Date=Wed Jan 22 17:06:56 2020

bcftools_viewVersion=1.11+htslib-1.11

bcftools_viewCommand=view -S IDs.txt -o filtered-chr10.vcf /fs/projects/botnia_study/FinnGen_Botnia_data/imputed/Botnia_FinnGen_df2_R4/Botnia_imputed_df2_R4_chr10.vcf.gz; Date=Thu Feb 11 17:49:33 2021

bcftools_viewCommand=view -h pruned-chr10.vcf; Date=Wed Feb 24 13:06:06 2021

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

ADD REPLY
0
Entering edit mode

Thank you for your time and suggestions. I used bcftools view (/bcftools view output.vcf > output_vcf.vcf) convert the binary format to vcf format and rvtest accepts this file.

ADD REPLY

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6