Question

Handling the large amount of missing value for measuring the similarity between genes in R

0

Entering edit mode

8.4 years ago

animesh_10kuet ▴ 10

I used Yeast microarray dataset from SGD. Sample dataset are in this link. But the dataset contains huge number of missing values. I need to compute similarity between genes. If i removed all the gene rows which contains NA values in their sample's column, the number of genes decrease into half of the total number of genes.

How can I handling the large amount of missing value in the path of measuring the similarity between genes in R? What will be the standard approach for it?

R gene yeast Similarity-Matrix Missing-Values • 2.5k views

ADD COMMENT • link updated 5.4 years ago by Biostar 20 • written 8.4 years ago by animesh_10kuet ▴ 10

score 0 · Answer 1 · 2016-07-06

0

Entering edit mode

8.4 years ago

Jean-Karim Heriche 27k

What kind of data are you dealing with ?
There are two main approaches for dealing with missing values: one is imputation, i.e. you replace the missing value by some estimate of what it should be, the other is data integration i.e. you combine your data set with some other data e.g. you could compensate for missing links in a protein interaction graph by combining it with a genetic interaction graph.
Which way you go depends on the type of data you have and on the question you're trying to address.

ADD COMMENT • link 8.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I uesd yeast microarray dataset compiled from a variety of expression experiments that provide expression profiles for yeast carrying out a variety of cellular programs and responding to a variety of applied stimuli. Sample dataset are in this link

ADD REPLY • link 8.4 years ago by animesh_10kuet ▴ 10

0

Entering edit mode

If you don't want to use complementary data then you need to do imputation or ignore the missing values. You may find this review useful. Another approach could be to use a downstream analysis method that can deal with missing values.

ADD REPLY • link 8.4 years ago by Jean-Karim Heriche 27k