Getting rid of noise in gene expression
1
1
Entering edit mode
8.3 years ago
ghunt ▴ 10

Hi, I am working with a data set containing gene-expression of cancer patients. And I am being told that the data obtained can be noisy. The gene expression value ranges from 0 to 20. And the number of patients is close to 2000. There are close to 50K of gene expression value of illumina id.

What would be the best way to filter out the noise due to the error of the illumina sequencing technique. Is there a general technique to get rid of noise.

Thanks.

genome noise-removal • 3.1k views
ADD COMMENT
0
Entering edit mode

If the data contains values 0 to 20 and an "illumina id", it is not sequencing data. It is microarray most likely.

ADD REPLY
2
Entering edit mode
8.3 years ago

There are many ways to reduce noise in RNA-seq gene expression data. I personally have found the following approach useful when dealing with heterogeneous tissue and >100 samples.

1.) Remove genes with low gene expression.

2.) Remove samples that lack adequate sequencing depth (My lab usually sequences at least 8 million mapped genes)

3.) Remove samples based upon their standard deviations away from the mean on a PCA/MDS plot.

4.) Use R packages such as PEER and sva/combat to remove batch effects from the data.

5.) Profile you data with tools such as WGCNA, see if any individual samples are driving non-nonsensical modules that don't relate to biology.

ADD COMMENT

Login before adding your answer.

Traffic: 1366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6