Question

Comparing datasets from two different library methods

0

Entering edit mode

5.2 years ago

bazok ▴ 40

Hi All, Please I need your input on how best to go about an rnaseq analysis I am currently working on as I couldnt find any closely related post. I have 5 datasets (4 with UMI counts and 1 with FPKM) to compare. I am taking the z-score of all the dataset separately before passing on to Seurat..

My questions are : - Is this a right direction or there is a better way around? -If it is the right approach, is there a need to do any normalization/log transformation/what normalization approach would be the best before merging or how best can one preprocess the datasets to be able get any valuable insight from the analysis? - Is it possible to convert UMI to FPKM and then follow the Seurat Multiple Dataset Integration guide to go by the comparison?

Thanks

rna-seq R next-gen • 1.6k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.2 years ago by bazok ▴ 40

0

Entering edit mode

I am taking the z-score of all the dataset separately before passing on to Seurat

Why not use the recommended workflow? Seurat is designed to work with UMI and FPKM data, not z-scores.

ADD REPLY • link 5.2 years ago by igor 13k

0

Entering edit mode

Thanks Igor. Since all the datasets are not in the same units, I thought taking the z-score first should form a basis for comparison(integration).

ADD REPLY • link 5.2 years ago by bazok ▴ 40

0

Entering edit mode

In the default workflow, Seurat will perform its own scaling.

ADD REPLY • link 5.2 years ago by igor 13k

0

Entering edit mode

Thanks alot Igor. I zoomed into how Seurat does this and I think it is like what i need. For the analysis (4 dataset in UMI and 1 in FPKM), I proceeded as in below

Read in the data and created Seurat Object
Normalized the 4 dataset with UMI count ( My understanding is that Seurat first normalizes for sequencing depth and then takes a log(e) i.e ln of the data. In this, I used scale.factor = 1000000
I took the log(e) of dataset with FPKM.(log(FPKM+1))

With the above, I started the normal data integration steps - FindVariableFeatures,FindIntegrationAnchors(I used "LogNormalize" vs "SCT" as normalization.method),IntegrateData,ScaleData,RunPCA etc.

Does this approach seem like the right one to compare the dataset in different units that I have?

Thanks alot

ADD REPLY • link 5.2 years ago by bazok ▴ 40

1

Entering edit mode

SCTransform is for UMI data.

The rest seems fine.

ADD REPLY • link 5.2 years ago by igor 13k