No INFO/info value in headers
0
0
Entering edit mode
3.4 years ago
ErickW • 0

I've gotten back vcf files for the Michigan server. My intent was to filter the vcfs by their info value; however, it seems the vcf files don't have that specific value listed in the header:

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##filedate=2021.7.17
##contig=<ID=5>
##pipeline=michigan-imputationserver-1.5.7
##imputation=minimac4-1.0.2
##phasing=eagle-2.4
##r2Filter=0.0
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy (R-square)">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=IMPUTED,Number=0,Type=Flag,Description="Marker was imputed but NOT genotyped">
##INFO=<ID=TYPED,Number=0,Type=Flag,Description="Marker was genotyped AND imputed">
##INFO=<ID=TYPED_ONLY,Number=0,Type=Flag,Description="Marker was genotyped but NOT imputed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=HDS,Number=2,Type=Float,Description="Estimated Haploid Alternate Allele Dosage">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1">

Would anyone have any insights/explanation on what's going on? I'm a novice at bioinformatics so any help would be appreciated.

Edit - Further clarification

The 'info' value I am addressing comes from the numerous values within the INFO column of the dataset. The vcf has its own typical columns of CHROM, POS, REF, ALT, INFO, etc (based on this explanation). And within the INFO column is numerous values such as R2-score (INFO/r2), P-value (INFO/p), MAF score (INFO/maf), etc.

Sample of my data

Looking into numerous papers and other individuals' posts on Biostars, there seems to be an info value (INFO/info). I wanted to utilize this as a filter, however, it seems to be missing (by both looking at my header, as well as by querying my data). So essentially I am asking - is there an explanation for the lack of the INFO/info value and/or is there a way to get it?

VCF Michigan-Imputation-Server • 1.9k views
ADD COMMENT
0
Entering edit mode

Can you please elaborate, what you mean by value in header file?

ADD REPLY
0
Entering edit mode

Sure! From what I've observed/seen in documentation, the INFO column carries numerous values/fields. For example, internationalgenome.org lists some of the possible values:

AA ancestral allele
AC allele count in genotypes, for each ALT allele, in the same order as listed
AF allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes
AN total number of alleles in called genotypes
BQ RMS base quality at this position
CIGAR cigar string describing how to align an alternate allele to the reference allele
DB dbSNP membership
DP combined depth across samples, e.g. DP=154
END end position of the variant described in this record (esp. for CNVs)
H2 membership in hapmap2
MQ RMS mapping quality, e.g. MQ=52
MQ0 Number of MAPQ == 0 reads covering this record
NS Number of samples with data
SB strand bias at this position
SOMATIC indicates that the record is a somatic mutation, for cancer genomics
VALIDATED validated by follow-up experiment

When I look into the INFO column of my vcf data, I receive values such as P-value (annotated as INFO/p), R2-score (INFO/r2), etc: My INFO

However, looking into some papers/other forums on Biostar, it seems that people are able to filter based on the INFO/info value, which seems to be missing from my vcf. Essentially, my question is there an explanation to why I don't have this field and/or is there a way of getting it?

ADD REPLY
1
Entering edit mode

I don't have experience working with imputed vcf. But I think before imputation, you should first filter your vcf (having expected info columns like DP, MAF etc) using vcftools. After imputation, use can further filter based on Estimated Imputation Accuracy (R-square) using bcftools. Please check the similar posts like this and this for more idea.

ADD REPLY

Login before adding your answer.

Traffic: 1584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6