genotype encoding
1
0
Entering edit mode
2.1 years ago
Eliza ▴ 40

Hi, I have data from VCF file that looks like this :

in the data, there are 21 patients and it contains information about SNP - each row is an Snp. enter image description here

Unfortunately I don't understand the "rules" of encoding the genotype to {0,1,2} based on the columns ALT, REF, AND homo\hetro-for homozygous or heterozygous.

what are the rules for it what does the encoding to {0,1,2} mean and how do I implement this on my table? thank you

gynotype homozygous snp hetrozygous • 1.4k views
ADD COMMENT
2
Entering edit mode
2.1 years ago
Dave Carlson ★ 2.0k

Typically a genotype representation such as {0,1,2} indicates the number of non-reference alleles at that site.

A site with a homozygous reference genotype (i.e., no SNP present) would have a 0. A heterozygous site would have a 1. And a homozygous non-reference site would be represented as 2.

See also:

Genotype representation with 0, 1, 2 - what do they mean?

ADD COMMENT
0
Entering edit mode

@Dave Carlson , so there will be no zeros in my data beacuse my data contains pnly SNP`s , ithought that if an SNP occures only once in the data then it is 0

ADD REPLY
0
Entering edit mode

@Dave Carlson -
so for example : in position 7618386 they are both hetro and there for will be encoded as 1? and in position 7618386 all the hetro will be encoded as 1 and the homo as 2?

ADD REPLY
0
Entering edit mode

In the screenshot you've provided, the heterozygous SNP at position 7618386 would indeed be represented by a 1. There are no homozygous SNPs at that position in the screenshot, but if there were, yes the genotype would be represented by a 2 (under this scheme).

ADD REPLY
0
Entering edit mode

@Dave Carlson thank you vey much . and i also have in my data this posibility : chrom pos ref alt hetro \homo x 7811259 G C hetro

becuse this SNP occures in the data only once ( only one patient has it) would it still be 1 ? or would it be 0?

ADD REPLY
0
Entering edit mode

Since we are talking about genotypes here, the only relevant information is how many different copies of the allele (SNP) are present in the sample or individual.

That said, because this seems to be an X chromosome, the actual genotype will depend on whether the sample/patient is male or female. For a female, the same representation system {0,1,2} should still apply. However in males, it may be different because only one X chromosome is present.

Different tools may handle X/Y genotype representations differently. I would recommend reading the documentation related to whatever tool you're using.

ADD REPLY
0
Entering edit mode

@Dave Carlson i have no information if the patient is female or male so based on your previous answers if from all the data I have in the VCF file if a certain SNP that I mentioned in the previous comment occurs only one for one patient and it is hetro it would be 1? or 0 because it occurs only one time?

ADD REPLY
0
Entering edit mode

@David Carlson . doea it mean that if i have a file that contains ony SNPs the encoding cant be zero?"A site with a homozygous reference genotype (i.e., no SNP present) "

ADD REPLY

Login before adding your answer.

Traffic: 2149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6