Question

Finding Variable Genes in Seurat, scRNA-seq

2

Entering edit mode

7.4 years ago

asyndeton17 ▴ 40

Hi,

I have a data matrix for scRNA-seq data (Drop-seq). How do I choose the parameters appropriately for the FindVariableGenes function in Seurat? Is there a plot I should be looking at beforehand to determine the correct parameters?

I can provide plots if needed.

Thanks

RNA-Seq scRNA-seq Seurat • 14k views

ADD COMMENT • link updated 6.0 years ago by CuriusScientist ▴ 50 • written 7.4 years ago by asyndeton17 ▴ 40

0

Entering edit mode

While you are waiting for someone give you an answer for this have you checked the manuals/vignette for Seurat?

ADD REPLY • link 7.4 years ago by GenoMax 148k

0

Entering edit mode

Yes, the help page for the function says to examine "the plot" first, but it doesn't refer to which plot.

ADD REPLY • link 7.4 years ago by asyndeton17 ▴ 40

0

Entering edit mode

Did any of you come up with a good answer to this problem?

ADD REPLY • link 6.7 years ago by PaulG • 0

score 0 · Answer 1 · 2017-09-21

0

Entering edit mode

7.3 years ago

halo22 ▴ 300

You can just run seurat with the parameters in the vignette. The dispersion vs avg-expression plot can help you decide the cutoff for x.low.cutoff, x.high.cutoff and y.cutoff. Once you figure out the parameters run FindVariableGenes with new parameters again.

ADD COMMENT • link 7.3 years ago by halo22 ▴ 300

1

Entering edit mode

I have some questions about the calculation and cut off of the dispersion, as dispersion.function
Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values,why it would have some negative values,and it seems that the genes are filtered actually by the object@hvg.info$gene.dispersion.scaled,sometimes in the plot ,it will show some white lines ,can u explain it what does that mean?

ADD REPLY • link 7.2 years ago by ovela77 ▴ 10

1

Entering edit mode

What do you look for in the plot exactly? I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes, but you can barely tell the difference in the plot.

ADD REPLY • link 7.1 years ago by igor 13k

0

Entering edit mode

Hi igor. I am now facing the same problem. I don't know how to select correct parameters for x.low.cutoff, x.high.cutoff and y.cutoff. I also found that a little change in one of these parameters will lead to huge change in numbers of variable genes.

I can get the plot as tutorial shows. But I don't know how to use that plot to help me select these parameters.

Do you have any suggestions now?

ADD REPLY • link 7.0 years ago by lishen0709 • 0

0

Entering edit mode

Hi @igor , @lishen0709 – did you find a solution to your question about selecting the correct parameters for x and y cutoffs in Seurat's FindVariableGenes?

Thanks!

ADD REPLY • link 6.3 years ago by gaelgarcia05 ▴ 280

0

Entering edit mode

yes, a bad picture ...

ADD REPLY • link 6.2 years ago by linouhao ▴ 10

0

Entering edit mode

can I ask how to get cutoff from function FindVariableGenes

ADD REPLY • link 6.2 years ago by linouhao ▴ 10

score 0 · Answer 2 · 2019-01-08

I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes

@igor, this is bound to happen as the number of genes is higher below 0 ( as can be seen by the dense plotting)

I was facing the same problem and on googling, I found this

Cutoff to find out number of variable genes? https://github.com/satijalab/seurat/issues/634

"All methods for HVG selection have some cutoff parameters, and unfortunately, criteria for 'optimality' are difficult to identify.

If you have UMI data, we suggest identifying HVG on the basis of variance-to-mean ratio, as we demonstrate here: https://satijalab.org/seurat/mca.html

Again, with UMI datasets - we typically do not notice large differences in the analysis depending on the exact number of genes selected- ranging from 2k genes to even the full transcriptome."

and if we go the link mentioned in the discussion, we will get this

Data Preprocessing

We perform standard log-normalization.

mca <- NormalizeData(object = mca, normalization.method = "LogNormalize", scale.factor = 10000)

FindVariableGenes calculates the variance and mean for each gene in the dataset in the dataset (storing this in object@hvg.info), and sorts genes by their variance/mean ratio (VMR). We have observed that for large-cell datasets with unique molecular identifiers, selecting highly variable genes (HVG) simply based on VMR is an efficient and robust strategy. Here, we select the top 1,000 HVG for downstream analysis.

mca <- FindVariableGenes(object = mca, mean.function = ExpMean, dispersion.function = LogVMR, do.plot = FALSE) hv.genes <- head(rownames(mca@hvg.info), 1000)

Also, as I found the plot to be pretty useless, I skipped plotting it by adding " do.plot = FALSE"

pbmc<-FindVariableGenes(object = pbmc, mean.function = ExpMean, dispersion.function = LogVMR, x.low.cutoff = 0.0125, x.high.cutoff = 3, y.cutoff = 0.5, do.plot = FALSE)

this will select variable genes defined by cutoff but will skip plotting