Hi all,
I have a scRNA-seq dataset, which has 6 patients and each patient has 2 sample types (normal, tumor). So I have 12 folders and each folder contain its scRNA data: barcodes.tsv.zip
, features.tsv.zip
, matirx.mtx.tsv.zip
. I would like to use this data to practice and learn the Seurat (version 4.0)
r package workflow.
I would like to process all the 12 samples into cluster and downstream analysis. For the preprocessing, I now have no idea which point I should merge all the datasets.
Should I read and QC each folder (e.g patient1-normal -> patient1-tumor -> patient2-normal .... so on), and then merge all data? I am still learning the scRNA seq, I believe each data will have different number of genes and cells, I need to normalize it after merging all the data.
Thank you for your help and advice.
Thanks for your detailed answer, ATpoint! I will check out the https://osca.bioconductor.org/ today. Really appreciate it!
I was confused when to merge/integrate the datasets is because I found the
Merge
function (https://satijalab.org/seurat/v3.1/merge_vignette.html) onSeurat
. I think whatMerge
does it to merge multiple 10X dataset into one one object and we can do the QC together. But I am wondering which way is the best practice:Definitely the 2nd option. Integration is already a sort of analysis step and you want corrupted, damaged cells or those with very high or low depth out of the mix before doing any kind of analysis. Check for example the
scater
package at Bioc, they have some good examples on how to do QC, it is similar to Seurat though, and this OSCA also covers the QC.So my workflow will be
NormalizeData
)Merge
, not integrate) all the data (so now only one combined dataset)FindVariableFeatures
,ScaleData
)RunPCA
,FindNeighbors
,FindClusters
)Is it correct? (I may need to replace
Merge
with its integrating function if I see batch effect across conditon/patients?Thank you
Did you every get a reply to this?? I am just starting my Seurat journey and am wondering the same question! Thank you!
Same. did you get any solution ?
Hi, please, help me. I would like a best practices for single-cell analysis across more samples in the same condition. I have 4 samples about a colon cancer condition. What is the correct procedure in this case? I find only in relation to individual samples. But when I have multiple samples of the same condition, how do you do it? Please. Help me
ATpoint side question, from 10x do you recommend loading the raw or the filtered set?