Question

Obtaining List of highly variable genes and plotting

1

Entering edit mode

2.7 years ago

joe_genome ▴ 50

Hello,

I'm trying to obtain a list of the most highly variable genes of a set of data. I have genes in my columns and the samples per row. The RNA-Seq count data have already been normalized with cpm and applied with a log2 algorithm.

I compute the mean and variance for each gene from an RNA-Seq count matrix.
I calculate the standardized count by calculating the coefficient of variance.
Rank the genes based on this coefficient of variance when plotting.

I'm not sure though if these are adequate steps to take. Please see below my code snippet.

#Apply log2 transformation
logCounts <- log2((NormalizedCounts + 1))

#Compute the mean and the variance for each gene
geneVars <- apply(logCounts, 2, var)
geneMeans <- colMeans(logCounts)

#Calculate the coefficient of variance based on previous values
geneCV2 <- geneVars / geneMeans^2

#Plot Data
smoothScatter(log2(means),log2(cv2))

Alternatively, I thought I could just use directly the index and subset on my original dataset.

sortedVariance <- sort(geneVars, decreasing=TRUE, index.return=TRUE)$ix[1:2500]
HGVs <- logCounts[, sorted]

Any suggestions if I'm doing it alright, as, in the end, I want to take the top 2000 most variable genes. I am also trying to avoid the use of packages like limma and Deseq2.

RNA-Seq • 2.9k views

ADD COMMENT • link 2.7 years ago by joe_genome ▴ 50

score 0 · Answer 1 · 2022-08-24

0

Entering edit mode

2.7 years ago

ATpoint 87k

Not saying that I find this a meaningful analysis, but if you select by variance then data must be on the log scale. If not then variance is simply a function of the magnitude of counts. The best way would be to run the vst function from DESeq2, and then use rowVars() on that. Sort by decreasing rowwise variance, and take the top-n you need.

ADD COMMENT • link 2.7 years ago by ATpoint 87k

0

Entering edit mode

I should correct myself, this is on the log2 scale. I have upated my post

ADD REPLY • link 2.7 years ago by joe_genome ▴ 50

0

Entering edit mode

Your code is doing column-wise operations but genes are in rows.

ADD REPLY • link 2.7 years ago by ATpoint 87k

0

Entering edit mode

The genes are actually in columns :)

ADD REPLY • link 2.7 years ago by joe_genome ▴ 50