In Seurat's pbmc3k tutorial, they set the CreateSeuratObject with various parameters including min.cell=3
and min.features=200
Can anybody explain exactly what these two parameters mean and where they pull the numbers 3 and 200?
In Seurat's pbmc3k tutorial, they set the CreateSeuratObject with various parameters including min.cell=3
and min.features=200
Can anybody explain exactly what these two parameters mean and where they pull the numbers 3 and 200?
The Seurat manual does a good job explaining the parameters for any function.
min.cells Include features detected in at least this many cells. Will subset the counts matrix as well. To reintroduce excluded features, create a new object with a lower cutoff.
min.features Include cells where at least this many features are detected.
The values they picked here are somewhat arbitrary, but min.cells
helps limit the number of genes used by removing those unlikely to play any part in differentiating groups of cells due to being expressed in very few cells. In general, most genes removed will be those with zero counts across all cells. min.features
removes dead cells cells and empty droplets where few genes are detected.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your answer! I wonder if I could change the features I used in my analysis in the later process. I know I can subset samples with subset(object subset = nFeature_RNA > x) to filter the samples. Is there the same function for features selection?
Jared, thanks for your answer and the link to the Seurat Manual. I was only utilizing their website vignettes, so that will help tremendously.
Vignettes are great, as they explain how the package is meant to be used, but the manual is the go-to for any questions about function internals and how things are actually being done (and sometimes why). You can always get the manual information for any function or class from within R itself by sticking a
?
in front of it (e.g.?CreateSeuratObject
will show everything from the manual for that function, including parameters, details, usage examples, etc.).Hi Jared, I was wondering if you had any additional information for metrics to select the min.cells and min.features? I was hoping to find something a little more specific so I could feel more confident that the parameters I'm selecting are appropriate, but I'm having a hard time finding suggestions for specific methods people use for creating these cut-offs.
I understand you mentioned it can be somewhat arbitrary, but I was wondering if I should select these values specifically due to my dataset or use the parameters that tend to be reported online in the tutorials/codes available (3 and 200, as the OP stated above).
Thank you very much for any additional help or advice you may have on this topic!
You really have to look at the data and see what makes sense. Just a scatter plot of
nCount_RNA
andnFeature_RNA
colored bypercent.mt
will usually go a long way in determining which values might be appropriate to toss out most junk.However, you should also be aware that some cell types tend to have fewer genes represented, and you risk throwing out some of those cells using arbitrary cutoffs. Be sure to look at the metrics in combination and plot them to ensure you aren't throwing out potentially interesting populations wholesale due to a real biological difference.
Hi Jared, thank you for your reply! Your responses are easy to understand and I appreciate the time you took to reply to my question.