Entering edit mode
6.4 years ago
jamespower
▴
100
Hi,
I have downloaded Geuvadis genotypes from:
And I am trying to recreate vcf files with only the European samples using bcftools, but when I try
bcftools view --samples-file 373_sampleIDs.tab GEUVADIS.chr22.PH1PH2_465.IMPFRQFILT_BIALLELIC_PH.annotv2.genotypes.vcf.gz
I get this error:
[W::vcf_parse_format] FORMAT 'PP' is not defined in the header, assuming Type=String
[W::vcf_parse_format] FORMAT 'BD' is not defined in the header, assuming Type=String
Undefined tags in the header, cannot proceed in the sample subset mode.
Trying to fix it with bcftools reheader does not work:
bcftools view -h file.vcf > header.txt; bcftools reheader -h header.txt file.vcf > fixed.vcf
I can find information on the FORMAT "PP", but not on the FORMAT "BD".
##FORMAT=<ID=PP,Number=G,Type=Integer,Description="Phred-scaled Posterior Genotype Probabilities">
Would anybody be able to help? Thanks!
This is a warning, not an error. If the results is what you expect then everything is okay.
Even if it says it as a warning, I cannot do certain operations such as extracting sample IDs (I update the last output for the error message).
Hello jamespower,
bcftools is very strict about header informations. So if you want to use it, you must fixe this information or use other programs that are not that strict like snpSift.
The header you've found for
PP
says it is from the typeInteger
. But in the message you show before it expect aString
.Have a look into the vcf specifation and modify your header, so that the missing fields are in and have the right type. Also make sure that the value for
Number
is correct.If you show us the complete header of your current file and the first variants, we could help you doing this.
fin swimmer