Question

Negative Disease Genes Used For Bayesian Prediction

0

Entering edit mode

11.2 years ago

ewre ▴ 260

Hi all I am using bayesian model to construct a diease related network(gene gene interacts in disease related processes). do you have any idea on how to define a set of "negative disease genes" that can be used as the negative training data?

disease network training • 2.9k views

ADD COMMENT • link updated 11.2 years ago by mikhail.shugay 3.5k • written 11.2 years ago by ewre ▴ 260

0

Entering edit mode

Selection of negative set depends on the kind of information you are using to build your classifier. Are you using mutations or gene expression or something else?

ADD REPLY • link 11.2 years ago by mikhail.shugay 3.5k

0

Entering edit mode

gene expression data will be used.

ADD REPLY • link 11.2 years ago by ewre ▴ 260

score 1 · Answer 1 · 2014-02-24

1

Entering edit mode

11.2 years ago

mikhail.shugay 3.5k

If gene expression is to be used, then what first comes in mind for "negative disease genes" are the ones that are not differentially expressed in diseased condition vs control (e.g. housekeeping genes). Your question, however, is a little confusing. I believe that in expression-based classifier, you select some set of genes that reflect your condition and then use expression data in diseased samples and controls to train the network. So the negative training set is the expression in control samples, not a set of genes. What exactly are the features and instances used in your classifier?

ADD COMMENT • link 11.2 years ago by mikhail.shugay 3.5k

0

Entering edit mode

thanks for the reply, shuagay. actually I am not building a classifier. what I am doing is try to calculate the likelyhood ratio under different conditions and use this ration to construct a network. in this process, the basic unit we are dealing with is gene-gene pair. the condition can be,for example, gene a and gene b co-expressed( with a coexpression value) in expression datasets. The probability we want to calculate is Pr(this pair of gene has a co-expression value r given that this pair of genes co-existed in a disease database record)/Pr(co-exp value r given that this pair of genes didn't co-existed in that record). In order to get this ratio, we have to use a set of training data(disease database record, expression data, annotation data etc..) and a model(possibly a linear model) to estimate it. it would be very appreciated for any suggestion.

ADD REPLY • link 11.2 years ago by ewre ▴ 260

0

Entering edit mode

Ok now I see. I suggest you use some gene-disease annotation based on Disease Ontology (DO, http://disease-ontology.org/), e.g. from here http://doa.nubic.northwestern.edu/pages/search.php. Then you can make permutations and select gene pairs that share common DO as positive set and don't share any common DO terms as negative set.

ADD REPLY • link 11.2 years ago by mikhail.shugay 3.5k

0

Entering edit mode

I have tried DO, it's cool. the only problem is that is the information it provided reliable. the postive training data i used by now comes from two part, one is DisGenNet(http://ibi.imim.es/web/DisGeNET/v01;jsessionid=1w86jcj81w6sb1im5qu54f8ct), the other one is malacard(http://www.malacards.org/). the negative one is randomly sampled from the whole genome genes pairs reminded. I think it is better to do a filtration with DO on the randomly selected negative gene pairs provided that DO is reliable. besides, I have one more question: are there any papers which have and estimation on the ratio of "number of guilty gene pairs"/"number of non-guilty gene pairs" in human genome?

ADD REPLY • link 11.2 years ago by ewre ▴ 260