Hi, I am working with a data set containing gene-expression of cancer patients. And I am being told that the data obtained can be noisy. The gene expression value ranges from 0 to 20. And the number of patients is close to 2000. There are close to 50K of gene expression value of illumina id.
What would be the best way to filter out the noise due to the error of the illumina sequencing technique. Is there a general technique to get rid of noise.
Thanks.
If the data contains values 0 to 20 and an "illumina id", it is not sequencing data. It is microarray most likely.