To filter variants in exome sequencing there are some polymorphic genes which usually filter out. I have listed some of them in below. Is there any documented(reliable) list for this? Based on your experience if you know extra genes please add to this list.
Several MUC genes
Several OR genes (olfactory receptor genes)
Several USP genes
CTBP2
MLL3
CDC27
HTDIN
C21orf29
FLG
HRNR
RP1L1
MAGEC
RHD
Opsins(OPN…)
complement C4(C4A,C4B)
HLA genes
CYP2D6
KIR(killer receptor)gene family
The KIR (Killer receptor) gene family is known to be extensively copy-number polymorphic. E.g., http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002436
Very interesting idea. I have personally come across MUC4, KIR3DL1, and HYDIN as candidates in a family exome study recently. They weren't ruled out as potential candidates when I did an effect prediction with SIFT and PolyPhen.
Such a list could be very helpful to help further filter candidate genes in exome studies.
Thanks I will consider HYDIN.Does anyone else has experience in HYDIN? Actually the problem with these polymorphic genes is more.Even you can predict by SIFT or PolyPhen a damaging variant but indeed they are not a real causative variant or you can not confirm them by sanger method.
Yes, my post below mentions that HYDIN is a false positive. Hao Hu in our lab wrote a nice bayesian method to test for false positives in VAAST.
A definitely under appreciated issue in exome sequencing - I find POTEH pops up regularly in addition to these (all of which I recognise!)
This post would be more helpful to those of us with less experience in exome sequencing if you gave a brief background (or a link) explaining why these genes are particularly troublesome.
thanks for you advice.Actually in exome sequencing you will have almost all exon sequences(In human around 180000 exons) then you compare these sequences with a reference and finally you will have around 40000 or more variants(depend on you kit and method). But which one is causative?you should start to filter step by step to have some real candidates.We have problem in this step with polymorphic genes like HLA.So we try to carefully filter out them.
Thanks Omid, I understand the principle (I work in cancer genetics myself as it happens) but I would be careful about a priori blacklisting some genes because they are more frequently polymorphic. For some kinds of analysis, that is where the signal is (e.g. HLA variations are associated with various diseases).
Yes thanks for your comment.I am agree with you.For example cdc27 is a common polymorphic gene but recently I found an article which mentioned cdc27 has an interaction with my causative gene.So I deleted cdc27 from my list.