Entering edit mode
4.5 years ago
optimistsso4co3
▴
130
I'm looking at GWAS data and some genome wide variants have a pattern of being at the dead end of ~10 repeating As (where effect allele is also A). I wonder what would be the probability of such coincidence and would it be likely due to 1000G low coverage sequencing methodology?
It is most likely an error, depends on the technology used but many initial sequencers have problems with single-repeat runs.
I guess for many applications variants in homopolymer runs like this are best removed. They could be real, but with the evidence you have there is no way to tell and they are very often false positive.