Hi,
I have some genetic data in a bim file.
The chromosomes range from 0 to 23 and 26, which I have not come across before. Would the SNPs on chromosome 0 and 26 be removed from the genetic file or left in. Then, I have some SNPs which have a GSA (which the genotyping array) before the rsid but some appear normal without the GSA prefix, see below
19 GSA-rs117797881 88.57582 51991438 G A
vs
19 rs7259137 88.47165 51972314 G A
The SNPs on chr 26 look like this
26 MTReverseDLOOP_61 100 16399 G A
I am tempted to remake the file including only chromosomes 1-23 and removing the GSA prefix as the position and chromosome align with the correct SNP identifier (Hg38). Would leaving the Chromosome 26 in have any impact on imputation?
Would really appreciate advice/reassurance.
Thanks
Yes, you can leave out chr26 and chr0.
You can swap the rsid and GSA-rsid with chrom:position using
Plink
. It can be later re-annoted into rsid usingPlink
.If you use public resource like Michigan or TOPMed Imputation Servers, they won't take your chr26 input and impute it.
Thanks for the advice. What about the third column (GS in plink). I am used to it being 0? And last question, what does it mean when there is a "." in the place of an allele. For example, one allele is named but the other allele is a dot.
The third column in
Plink
.bim
file is position in morgans or centimorgans (safe to use dummy value of '0').I would get rid of that particular SNP completely from the data using
--exclude
function inPlink
.