Not call some variants in some populations?
1
0
Entering edit mode
5.3 years ago
star ▴ 350

I have three VCF files from three populations where I merged all their SNPs together. For some variants, there are genotypes in all population but not for all regions, so I got 'NA', I like to know what does it mean? means: no variant call for this region in samples with 'NA' or can I consider 'NA' variants as reference genotype?

ID         CHROM    chromStart  chromEnd    REF     alleles  pop1   pop2  pop3

rs10084237  chr2    76517559    76517560    T       C,T,     NA      CC    NA   
rs10084293  chr2    70917811    70917812    C       C,T,     CT      TT    TT   
rs10084353  chr2    61020552    61020553    A       A,G,     AG      NA    GG
VCF variantion • 869 views
ADD COMMENT
1
Entering edit mode

How did you merge the variants? The exact method you used is what determines what these NAs mean.

ADD REPLY
2
Entering edit mode
5.3 years ago
bari.ballew ▴ 470

To expand on what RamRS said, if you've merged VCFs and not gVCFs, note that VCFs only report a genomic location if there is a variant in that individual, so you are susceptible to a missing data problem. When a variant is reported in A.vcf, but not in B.vcf, the merged file will record the variant as missing "./." for sample B. Does that mean there was insufficient coverage to make a call, or was there plenty of coverage and simply no variant reads? If you're looking exclusively at very rare variants, then sometimes assuming a homozygous reference genotype for missing calls is appropriate, but it depends on the downstream analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 1583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6