I want to calculate kinship matrix for autosomes in order to run a LMM. For this purpose, I first pruned each autosome using bcftools (version 1.11) and excluded the snps that had r2 greater than 0.15 by utilizing the command:
/bcftools +prune -m 0.15 -w 1000 input.vcf -Ob -o output.vcf
When I use the file output.vcf for calculating the kinship matrix using Rvtests utilizing the command:
/vcf2kinship --inVcf output.vcf --bn --out kinship_output
I get an error: 'Wrong VCF header'
However, if I use the above command with input.vcf (i.e., the original file without pruning from the first command), the same command runs with the Rvtest and I get a kinship matrix. I assume while pruning, I am ouputting the vcf file in a wrong format. I have tried reading the bcftools tutorial but I have no clue where I am going wrong.
Can you post the .vcf header for output.vcf
When I open the file,I get a warning: may be a binary file. See it anyway?
When I use an extension vcf.gz in the output file while pruning. The file header looks like below:
BCF^B^B^A<b6>^A^@##fileformat=VCFv4.2 A lot of meta-data in between followed by the usual vcf header CHROM POS etc followed by rest of the chromosome file.
Again, I get the same error: 'Wrong VCF header'
The header is compressed so you need to use `bcftools view -h output.vcf'. Please edit the output into your original question.
Do you mean in this command?
/bcftools +prune -m 0.15 -w 1000 input.vcf -Ob -o output.vcf
Can you please elaborate?
No. We need to see what the header of your vcf file looks like before we can diagnose the error. So you need to run
bcftools view -h output.vcf
which prints out just the header of the file which doesn't work in /vcf2kinship. Then edit the header into your original question.Using bcftools view -h output.vcf, the header has IDs of the participants as: number_string
Can you edit your question to include the entire output of bcftools view -h output.vcf.
Thank you for your assistance. I have edited the original question
Are you sure that's the whole header? It should look something like this:
Sorry, I misunderstood you. I am quite new to working in genomics. Using bcftools view -h output.vcf, the header of a an output file for e.g. for chr10 appears as below. I am working on single chromosome imputed data.
fileformat=VCFv4.2
FILTER=<id=pass,description="all filters="" passed"="">
filedate=20190717
source="beagle.27Jan18.7e1.jar (version 4.1)"
FORMAT=<id=gt,number=1,type=string,description="genotype">
FORMAT=<id=ds,number=a,type=float,description="estimated alt="" dose="" [p(ra)="" +="" p(aa)]"="">
FORMAT=<id=gp,number=g,type=float,description="estimated genotype="" probability"="">
contig=<id=chr10>
Some more lines
INFO=<id=an,number=1,type=integer,description="total number="" of="" alleles="" in="" called="" genotypes"="">
INFO=<id=ac,number=a,type=integer,description="allele count="" in="" genotypes"="">
INFO=<id=ns,number=1,type=integer,description="number of="" samples="" with="" data"="">
INFO=<id=ac_hom,number=a,type=integer,description="allele counts="" in="" homozygous="" genotypes"="">
INFO=<id=ac_het,number=a,type=integer,description="allele counts="" in="" heterozygous="" genotypes"="">
INFO=<id=ac_hemi,number=a,type=integer,description="allele counts="" in="" hemizygous="" genotypes"="">
INFO=<id=af,number=a,type=float,description="allele frequency"="">
INFO=<id=maf,number=a,type=float,description="minor allele="" frequency"="">
INFO=<id=hwe,number=a,type=float,description="hwe test="" (pmid:15789306)"="">
INFO=<id=exchet,number=a,type=float,description="probability of="" excess="" heterozygosity"="">
bcftools_pluginCommand=plugin fill-tags -Ou; Date=Wed Jan 22 17:06:56 2020
INFO=<id=info,number=1,type=float,description="impute2 info="" score"="">
bcftools_pluginCommand=plugin impute-info -Oz -o Botnia_df2_R4_chr10.vcf.gz; Date=Wed Jan 22 17:06:56 2020
bcftools_viewVersion=1.11+htslib-1.11
bcftools_viewCommand=view -S IDs.txt -o filtered-chr10.vcf /fs/projects/botnia_study/FinnGen_Botnia_data/imputed/Botnia_FinnGen_df2_R4/Botnia_imputed_df2_R4_chr10.vcf.gz; Date=Thu Feb 11 17:49:33 2021
bcftools_viewCommand=view -h pruned-chr10.vcf; Date=Wed Feb 24 13:06:06 2021
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
Thank you for your time and suggestions. I used bcftools view (/bcftools view output.vcf > output_vcf.vcf) convert the binary format to vcf format and rvtest accepts this file.