Understanding mother and father alleles in VCF file
0
0
Entering edit mode
19 months ago
c. • 0

Hello, I have a VCF file containing all SNPs. I am struggling to understand the meaning of each line. How can I understand, for each SNP, the value of my mother side and the value of my father side?

For example in this line:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NG1RXE8LDZ
chr1    15903   rs557514207 G   GC  274.75  .   AC=2;AF=1.00;AN=2;DB;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=29.48;QD=34.24;SOR=0.941  GT:AD:DP:GQ:PL  1/1:0,7:7:24:312,24,0

if you can please explain in easy words, since I am an ignorant.

Thank you

snp vcf • 1.2k views
ADD COMMENT
0
Entering edit mode

Is this from some company like 23andMe? If so, my statement may not be valid. If not, you cannot deduce parental alleles from unphased VCFs. Look up phasing in VCFs to understand the concept. In essence, you need VCF entries for all individuals involved and a pedigree file showing the relationships to get phased variants.

ADD REPLY
0
Entering edit mode

It's from Nebula genomics. It must be possible to deduce parental alleles, since I converted the VCF file into a Myheritage VCF file and then I uploaded this VCF file on Myheritage and I found actual relatives. That means the file contained the indication of which alleles are from mother side and which from father side. I don't need any phasing, whatever it means.

ADD REPLY
0
Entering edit mode

I don't understand your logic. You don't know what phasing is but are confident it is not required, and your proof is that relatives were detected based on a VCF file? Do you think finding related haplotypes and determining (with certainty) the parental origin of each allele require the same input?

For example, did it detect any female paternal cousins just by comparing your SNVs to theirs? Some maternal relatives can be deduced by conservation in the mitochondrial genome and some paternal male relatives by conservation in the Y chromosome - that is basic biology. Autosomal variants cannot be phased for any individual without information on the parents. I have actually performed family based variant analyses and correction of faulty pedigree/relatedness information so I know a little bit of what I'm talking about.

ADD REPLY
0
Entering edit mode

actually they didn't tell me which relatives were from mother side and which from father side, you are right. But how can you find related haplotypes then? for example a random person could appear as my relative just because has the same value of certain SNPs, despite these value are on different chromosomes.... I guess to be real relative you must have a sequence of identical SNPs on the same chromosome, not some on my mother chromosome and some on my father chromosome. Or I am wrong?

ADD REPLY
0
Entering edit mode

a random person could appear as my relative just because has the same value of certain SNPs,

Yes. I recall there being a threshold, something like an Nth cousin sharing as much DNA with you as a complete stranger. The farther you go from your generation and your direct ancestry line, the less DNA you share with people. See this table for an idea: https://en.wikipedia.org/wiki/Coefficient_of_relationship#Human_relationships

despite these value are on different chromosomes....

Again, you lost me there.

I guess to be real relative you must have a sequence of identical SNPs on the same chromosome, not some on my mother chromosome and some on my father chromosome.

I honestly don't recall how plink or KING do their calculations, but you need relatedness values between all pairs of possibly related individuals to hypothesize how they could be related. You will also need chrX based relatedness between samples, chrY based relatedness between samples and sex determination done on each sample to have as much evidence as possible. plink also gives you fractions of DNA where one-copy, both copies or no copy are shared, so that's useful as well. I recall using the relatedness score, then stepping into the three shared-copy-based values and sex data to hypothesize relationships. Other factors such as sample identifier, collection location and collection time also help but those are metadata.

ADD REPLY

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6