For CreateSeuratObject, Where Do the Values for min.cells and min.features come from?
1
3
Entering edit mode
5.1 years ago

In Seurat's pbmc3k tutorial, they set the CreateSeuratObject with various parameters including min.cell=3 and min.features=200

Can anybody explain exactly what these two parameters mean and where they pull the numbers 3 and 200?

Seurat R sc-rna-seq • 12k views
ADD COMMENT
10
Entering edit mode
5.1 years ago

The Seurat manual does a good job explaining the parameters for any function.

min.cells Include features detected in at least this many cells. Will subset the counts matrix as well. To reintroduce excluded features, create a new object with a lower cutoff.

min.features Include cells where at least this many features are detected.

The values they picked here are somewhat arbitrary, but min.cells helps limit the number of genes used by removing those unlikely to play any part in differentiating groups of cells due to being expressed in very few cells. In general, most genes removed will be those with zero counts across all cells. min.features removes dead cells cells and empty droplets where few genes are detected.

ADD COMMENT
1
Entering edit mode

Thanks for your answer! I wonder if I could change the features I used in my analysis in the later process. I know I can subset samples with subset(object subset = nFeature_RNA > x) to filter the samples. Is there the same function for features selection?

ADD REPLY
0
Entering edit mode

Jared, thanks for your answer and the link to the Seurat Manual. I was only utilizing their website vignettes, so that will help tremendously.

ADD REPLY
0
Entering edit mode

Vignettes are great, as they explain how the package is meant to be used, but the manual is the go-to for any questions about function internals and how things are actually being done (and sometimes why). You can always get the manual information for any function or class from within R itself by sticking a ? in front of it (e.g. ?CreateSeuratObject will show everything from the manual for that function, including parameters, details, usage examples, etc.).

ADD REPLY
0
Entering edit mode

Hi Jared, I was wondering if you had any additional information for metrics to select the min.cells and min.features? I was hoping to find something a little more specific so I could feel more confident that the parameters I'm selecting are appropriate, but I'm having a hard time finding suggestions for specific methods people use for creating these cut-offs.

I understand you mentioned it can be somewhat arbitrary, but I was wondering if I should select these values specifically due to my dataset or use the parameters that tend to be reported online in the tutorials/codes available (3 and 200, as the OP stated above).

Thank you very much for any additional help or advice you may have on this topic!

ADD REPLY
1
Entering edit mode

You really have to look at the data and see what makes sense. Just a scatter plot of nCount_RNA and nFeature_RNA colored by percent.mt will usually go a long way in determining which values might be appropriate to toss out most junk.

However, you should also be aware that some cell types tend to have fewer genes represented, and you risk throwing out some of those cells using arbitrary cutoffs. Be sure to look at the metrics in combination and plot them to ensure you aren't throwing out potentially interesting populations wholesale due to a real biological difference.

ADD REPLY
0
Entering edit mode

Hi Jared, thank you for your reply! Your responses are easy to understand and I appreciate the time you took to reply to my question.

ADD REPLY

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6