I have some genotype data which gives me some problems, and I suspect that the genome build is not what it is supposed to be. I have a bim file with the following line:
1 rs2465136 0 980280 G A
According to dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=2465136), this is a SNP on chr1, but the coordinate matches no genome build (1055037 or 990417, instead of 980280). However, a little further down that page, that number occurs in a 1KG handle (1000GENOMES|CEU.trio.12.15.2008_268_chr1_980280). I'm sure I'm missing something obvious here, but could someone explain to me what is going on?
Thanks. Didn't know dbSNP did not provide hg18 coordinates.
You're welcome.
hg18 is that old. It releases 2006. hg19 releases 2009. Since 2013 we have hg38. There is no need to support that old reference genomes.
BTW: Does anybody know why gnomAD was build based on h19 and not hg38?
Well, that's a matter of perspective ;-) If one works with older data, like myself, there's a need for that as demonstrated by this question ;-) Would be nice to get ALL available information for a given Snp.
muraved, unrelated but maybe this will be of interest: A: Alternate nucleotide is more frequent than reference nucleotide. OMG I'm dizzy.
I very rarely come across datasets/programs that require or are based on hg18, but they obviously do exist. I think that I even saw hg16 one time. As long as the build is clearly stated, you have covered yourself. Generally, though, we should be moving to hg38 where possible.