I am just started to understand SNP and related information. So when I searched a particular SNP rs80334247. I got lot of information like
- Chromosome No.3
- Minor allele count T=0.0038/19 (1000 Genomes) 3.Gene ID SCN5A (6331)
- Major and Minor allele count on different populations around the world.
and lot of other information
Now my question is that where to search for information like Homozygous & Heterozygous,Dominant and Recessive allele and how this information can be downloaded on all population?
For example if I want to test a particular SNP by calculating Chi-Square and P-value then I need to make a contingency table like following
AA Aa aa
Subjects
Control
On the other hand if I want to calculate p -values using logistic regression where predictors are SNP and response is 1 or 0 for subject and control. Then what information would be needed in SNPs like SNP1 would have what type of values?
My Understanding
Take Y as response variable that takes 0 or 1 for control and subject and lets say I have 3 SNPs. SNP1 SNP2 and SNP3 on 10 subjects
Y SNP1 SNP2 SNP3
1 ? ? ?`
1
0
0
1
0
1
0
1
0
I have confusion here that what will be the corresponding values in SNPs as a single SNP has lot of information like MAF, major allele count or minor allele count etc or these SNP can be encoded like 0 and 1 for example if our reference allele is A(by the way how I know this is reference?) then in each subjects either we have that major allele or not then we can encode it as 0 or 1.
So I have these number of confusions related to SNP dataset and its usage? If somebody could explain me with a small example dataset on SNP it would relieve me of much pain related to SNP dataset understanding and its usage. ?
@Kevin Blighe Thank you very much indeed for your valuable answers. Please also refer me specific link for small scale data on SNP online as I want to see those information like AA aa Aa for particular some SNP like "rs80334247"
regards
No problem. The minor allele for each SNP is listed in dbSNP. For example, for the entry that you mentioned, rs80334247, I can see that the minor allele is T, with a MAF of 0.0038 (0.38%). The 'ancestral allele', i.e., the one that has become fixed in the human genome, is A.
@Kevin Blighe please Where are these heterozygous and homozygous information thats what I am not finding in these datasets?
If you only have a few SNPs, then you can search for them at the 1000 Genomes - A Deep Catalog of Human Genetic Variation. For example, Here is information for your SNP - this shows the major and minor allele in each population group:
[source: http://phase3browser.1000genomes.org/Homo_sapiens/Variation/Population?db=core;r=3:38610335-38611335;v=rs80334247;vdb=variation;vf=18768208]
---------------------------------------------
If you have a large number of SNPs, you will require a different program that can automate the process. Let me know.