Polymorphic Genes And Filtration In Exome Sequencing
1
5
Entering edit mode
12.3 years ago
Omid ▴ 590

To filter variants in exome sequencing there are some polymorphic genes which usually filter out. I have listed some of them in below. Is there any documented(reliable) list for this? Based on your experience if you know extra genes please add to this list.

Several MUC genes

Several OR genes (olfactory receptor genes)

Several USP genes

CTBP2

MLL3

CDC27

HTDIN

C21orf29

FLG

HRNR

RP1L1

MAGEC

RHD

Opsins(OPN…)

complement C4(C4A,C4B)

HLA genes

CYP2D6

KIR(killer receptor)gene family

exome filter • 7.2k views
ADD COMMENT
2
Entering edit mode

The KIR (Killer receptor) gene family is known to be extensively copy-number polymorphic. E.g., http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1002436

ADD REPLY
1
Entering edit mode

Very interesting idea. I have personally come across MUC4, KIR3DL1, and HYDIN as candidates in a family exome study recently. They weren't ruled out as potential candidates when I did an effect prediction with SIFT and PolyPhen.

Such a list could be very helpful to help further filter candidate genes in exome studies.

ADD REPLY
0
Entering edit mode

Thanks I will consider HYDIN.Does anyone else has experience in HYDIN? Actually the problem with these polymorphic genes is more.Even you can predict by SIFT or PolyPhen a damaging variant but indeed they are not a real causative variant or you can not confirm them by sanger method.

ADD REPLY
0
Entering edit mode

Yes, my post below mentions that HYDIN is a false positive. Hao Hu in our lab wrote a nice bayesian method to test for false positives in VAAST.

ADD REPLY
0
Entering edit mode

A definitely under appreciated issue in exome sequencing - I find POTEH pops up regularly in addition to these (all of which I recognise!)

ADD REPLY
0
Entering edit mode

This post would be more helpful to those of us with less experience in exome sequencing if you gave a brief background (or a link) explaining why these genes are particularly troublesome.

ADD REPLY
0
Entering edit mode

thanks for you advice.Actually in exome sequencing you will have almost all exon sequences(In human around 180000 exons) then you compare these sequences with a reference and finally you will have around 40000 or more variants(depend on you kit and method). But which one is causative?you should start to filter step by step to have some real candidates.We have problem in this step with polymorphic genes like HLA.So we try to carefully filter out them.

ADD REPLY
1
Entering edit mode

Thanks Omid, I understand the principle (I work in cancer genetics myself as it happens) but I would be careful about a priori blacklisting some genes because they are more frequently polymorphic. For some kinds of analysis, that is where the signal is (e.g. HLA variations are associated with various diseases).

ADD REPLY
0
Entering edit mode

Yes thanks for your comment.I am agree with you.For example cdc27 is a common polymorphic gene but recently I found an article which mentioned cdc27 has an interaction with my causative gene.So I deleted cdc27 from my list.

ADD REPLY
2
Entering edit mode
12.3 years ago

Anyone interested in false positives in NGS should read:

Taxonomizing, sizing, and overcoming the incidentalome.

http://www.ncbi.nlm.nih.gov/pubmed/22323072

P.S. the gene Hydin is my most common false positive across illumina exome data.

ADD COMMENT

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6