Entering edit mode
8.2 years ago
fatima
▴
20
Hi everyone, I have a question.I have a set of data including cases in plink format. some SNPs have "I" or "D" phenotype instead of A,T,C,G!! What is the reason for that?
Can they be the reason for having Triallelic error in merging two data set?
In addition some SNPs do not start by rs.... , for example they have such names: imm_2_162711471 or 2-162794609-AT-DELETION
Should I remove them or change them before merging to not have the regular errors?
Thank you.
Phenotype is the physical manifestation of the organism, not it's genetic code. A/T/C/G are genotypes. "I" and "D" probably stand for "insertion" or "deletion", suggesting a letter of DNA hasn't been changed to A/T/C/G, but rather a new letter has been added or an existing letter deleted.
rs-SNPs are just SNPs that have been seen before and documented in dbSNP. So if you're not seeing that it's probably because it's the first time that variant has ever been seen before. For example, "2-162794609-AT-DELETION" would suggest to me that at chromosome 2, base pair 162794609, two letters (A and T) have been deleted. But that's just a guess. I don't know your formatting.
"Triallelic error" is something else entirely. I'd need to see the error. I'm not even sure why such thing would be an "error". It's perfectly legitimate in a single individual with a CNV, let alone a merged dataset. Good questions though - I didn't/don't meant to sound negative.