Hi,
I have data from VCF file that looks like this :
in the data, there are 21 patients and it contains information about SNP - each row is an Snp.
Unfortunately I don't understand the "rules" of encoding the genotype to {0,1,2} based on the columns ALT, REF, AND homo\hetro-for homozygous or heterozygous.
what are the rules for it what does the encoding to {0,1,2} mean and how do I implement this on my table? thank you
Typically a genotype representation such as {0,1,2} indicates the number of non-reference alleles at that site.
A site with a homozygous reference genotype (i.e., no SNP present) would have a 0. A heterozygous site would have a 1. And a homozygous non-reference site would be represented as 2.
@Dave Carlson , so there will be no zeros in my data beacuse my data contains pnly SNP`s , ithought that if an SNP occures only once in the data then it is 0
@Dave Carlson -
so for example : in position 7618386 they are both hetro and there for will be encoded as 1?
and in position 7618386 all the hetro will be encoded as 1 and the homo as 2?
In the screenshot you've provided, the heterozygous SNP at position 7618386 would indeed be represented by a 1. There are no homozygous SNPs at that position in the screenshot, but if there were, yes the genotype would be represented by a 2 (under this scheme).
Since we are talking about genotypes here, the only relevant information is how many different copies of the allele (SNP) are present in the sample or individual.
That said, because this seems to be an X chromosome, the actual genotype will depend on whether the sample/patient is male or female. For a female, the same representation system {0,1,2} should still apply. However in males, it may be different because only one X chromosome is present.
Different tools may handle X/Y genotype representations differently. I would recommend reading the documentation related to whatever tool you're using.
@Dave Carlson i have no information if the patient is female or male so based on your previous answers if from all the data I have in the VCF file if a certain SNP that I mentioned in the previous comment occurs only one for one patient and it is hetro it would be 1? or 0 because it occurs only one time?
@David Carlson . doea it mean that if i have a file that contains ony SNPs the encoding cant be zero?"A site with a homozygous reference genotype (i.e., no SNP present) "
@Dave Carlson , so there will be no zeros in my data beacuse my data contains pnly SNP`s , ithought that if an SNP occures only once in the data then it is 0
@Dave Carlson -
so for example : in position 7618386 they are both hetro and there for will be encoded as 1? and in position 7618386 all the hetro will be encoded as 1 and the homo as 2?
In the screenshot you've provided, the heterozygous SNP at position 7618386 would indeed be represented by a 1. There are no homozygous SNPs at that position in the screenshot, but if there were, yes the genotype would be represented by a 2 (under this scheme).
@Dave Carlson thank you vey much . and i also have in my data this posibility : chrom pos ref alt hetro \homo x 7811259 G C hetro
becuse this SNP occures in the data only once ( only one patient has it) would it still be 1 ? or would it be 0?
Since we are talking about genotypes here, the only relevant information is how many different copies of the allele (SNP) are present in the sample or individual.
That said, because this seems to be an X chromosome, the actual genotype will depend on whether the sample/patient is male or female. For a female, the same representation system {0,1,2} should still apply. However in males, it may be different because only one X chromosome is present.
Different tools may handle X/Y genotype representations differently. I would recommend reading the documentation related to whatever tool you're using.
@Dave Carlson i have no information if the patient is female or male so based on your previous answers if from all the data I have in the VCF file if a certain SNP that I mentioned in the previous comment occurs only one for one patient and it is hetro it would be 1? or 0 because it occurs only one time?
@David Carlson . doea it mean that if i have a file that contains ony SNPs the encoding cant be zero?"A site with a homozygous reference genotype (i.e., no SNP present) "