Hello,
I was wondering if you could answer a conceptional question for me.
I have noticed, that there are quite some SNPs, where the reference nucleotide is not the most common nucleotide in the majority of populations.
One such example would be the SNP rs13303010. If I take a look at this SNP in ensembl, it says that "G" is the reference nucleotide and "A" is the alternative nucleotide. This is the case for both GRCh37 and GRCh38. When looking at the population genetics, it is clear that for all populations except the African, the A nucleotide is much more common. The "G" reference allele only has a allele frequency of 37% across all populations.
So my question is, why the reference assembly (GRCh38) was not changed at that position when moving from GRCh37 to GRCh38.
I thought the reference assembly is supposed to be somewhat like a consensus sequence of all humans and therefore "A" should be put as the reference because "A" has the highest allele frequency.
I would be very thankful if somebody could explain to me why the reference is not changed in such a case.
Cheers.
While this is true, there was an attempt to make more of the reference alleles the major allele in GRCh38 than in previous genome versions.
Oh yes, some have been corrected, but you absolutely shouldn't rely on that.
Thanks! Big misconception on my side