I am new to genomic world, so my questions might be bit silly.
I been reading about it a lot recently and still I get confused with few things. And I feel if I am able to find answer for this, then I will be able to relate things. The sickle cell anemia is what I took as an example
Here is my understanding,
Sickle cell anemia is caused by the mutation of A with T, so if a person got this mutation from parental and maternal gene then for sure he/she will have sickle cell anemia.
So here A is the dominant and T is the recessive alleleā¦
If person get both dominant allele A from both parents(homozygous dominant) === then no issues
If person get A from one parent and T from other parent (heterozygous ) === then he/she gets sickle cell trait
If person get both recessive allele (homozygous recessive ) === then he/she gets sickle cell anemia.
So based on my understanding this is how it should look like in a VCF file or other database
REF A (from the positive strand of the human reference genome)
ALT T (the variations seen)
So for the genotype
0/0 person got both A from both the parents == no sickle cell trait
0/1 person got A and T from either of the parents == sickle cell trait
1/1 person got T from both parents == sickle cell anemia.
Now the confusion started when I looked at the vcf file we have, so for the rsid rs334 (position of sickle cell mutation) it shows following
REF T
ALT A,C,G
This not what I expected, I thought the reference allele and the sequence from parents (both mom and dad) will be looked in the forward strand in a VCF file. To clarify I looked at both Ensembl and dBSNP, which confused me even more.
Ensembl have the same thing what I saw in the VCF file
dBSNP I saw
RefSNP Alleles A/C/G/T (REV)
Ancestral Allele A
Contig Allele A
So in short , I am trying to find answers for following,
Is my understanding about sickle cell allele is correct ?
Why its showing like this in the VCF file ?
Please help me understand what I saw in dBSNP
Thank you