Obtaining List of highly variable genes and plotting
1
1
Entering edit mode
2.3 years ago
joe_genome ▴ 50

Hello,

I'm trying to obtain a list of the most highly variable genes of a set of data. I have genes in my columns and the samples per row. The RNA-Seq count data have already been normalized with cpm and applied with a log2 algorithm.

  1. I compute the mean and variance for each gene from an RNA-Seq count matrix.
  2. I calculate the standardized count by calculating the coefficient of variance.
  3. Rank the genes based on this coefficient of variance when plotting.

I'm not sure though if these are adequate steps to take. Please see below my code snippet.

#Apply log2 transformation
logCounts <- log2((NormalizedCounts + 1))

#Compute the mean and the variance for each gene
geneVars <- apply(logCounts, 2, var)
geneMeans <- colMeans(logCounts)

#Calculate the coefficient of variance based on previous values
geneCV2 <- geneVars / geneMeans^2

#Plot Data
smoothScatter(log2(means),log2(cv2))

Alternatively, I thought I could just use directly the index and subset on my original dataset.

sortedVariance <- sort(geneVars, decreasing=TRUE, index.return=TRUE)$ix[1:2500]
HGVs <- logCounts[, sorted]

Any suggestions if I'm doing it alright, as, in the end, I want to take the top 2000 most variable genes. I am also trying to avoid the use of packages like limma and Deseq2.

RNA-Seq • 2.3k views
ADD COMMENT
0
Entering edit mode
2.3 years ago
ATpoint 85k

Not saying that I find this a meaningful analysis, but if you select by variance then data must be on the log scale. If not then variance is simply a function of the magnitude of counts. The best way would be to run the vst function from DESeq2, and then use rowVars() on that. Sort by decreasing rowwise variance, and take the top-n you need.

ADD COMMENT
0
Entering edit mode

I should correct myself, this is on the log2 scale. I have upated my post

ADD REPLY
0
Entering edit mode

Your code is doing column-wise operations but genes are in rows.

ADD REPLY
0
Entering edit mode

The genes are actually in columns :)

ADD REPLY

Login before adding your answer.

Traffic: 1575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6