RNA-Seq replicates handling for prediction
0
0
Entering edit mode
8.5 years ago
Tobias ▴ 150

Currently, I am trying to analyze the RNA-Seq (or other gene expression) data from approximately 1000 different samples, so I obtain a matrix of dimension 20000 (no. of genes) x 1000 (no. of samples), where each entry reflects the gene expression of gene i in sample j.

My aim is to predict the gene expression in each of these samples. For that purpose I would do a 10-fold CV on the samples, i.e., I split the samples in 10% chunks and try to predict the gene expression values for a particular sample by a model fitted on those 9 chunks of 10% samples in which the sample is not contained in.

Now there are several samples each that are replicates for one cell line. Hence it might by that I predict the gene expression values of a sample (cell line) by a model fitted on other samples where some of them are the replicates for the cell line.

Is such a thing conceptually correct or not? Additionally, it might be worth to add that the correlations between the replicate samples (over all the genes) are in a similar range as most other correlations between any two samples (0.4-0.9).

Many thanks for your help in advance!

RNA-Seq R ChIP-Seq sequencing • 1.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6