I the sceasy R package to convert Burclaff et al.'s (2022) single-cell data (GSE185224) from scanpy H5AD data to a Seurat R object. My object's UMAP looks similar to the authors, and I subsetted out "colon" samples. I then normalized and scaled (default parameters) my subsetted data, but the normalized-data looks very different from the counts. I would like advice on how to proceed. For context, the authors stated that
"After filtering, read counts were logtransformed and normalized to the median read depth of donor 2, which had the fewest read counts"
I have attached screenshots of the Normalized-data vs counts FeaturePlot for one gene.
Thank you, Aydin
I would suggest you to proceed with
GSE185224_Donor1_filtered_feature_bc_matrix.h5
,GSE185224_Donor2_filtered_feature_bc_matrix.h5
andGSE185224_Donor3_filtered_feature_bc_matrix.h5
rather thanGSE185224_clustered_annotated_adata_k10_lr0.92_v1.7.h5ad.gz
. It is likely that you may have double normalized the data. Better off starting from raw data.Thanks for the suggestion! I actually did this, but another issue I faced is trying to demultiplex the samples. Does this look correct to you?