Question

Integrate single-cell datasets (TPM and raw) to find gene markers between clusters

2

Entering edit mode

4.4 years ago

A. Domingues ★ 2.7k

Hi all,

I am trying to find cell markers to distinguish population A and B using single-cell RNA-seq data publicly available. The snag is that these populations where identified in different studies, and the data is available as raw counts (10x) for one study and TPMs (Smart-seq*) for another.

Any suggestion how to integrate these datasets to perform DE downstream?

I was considering using seurat and SCTransform. Any objections?

*I think. It is not clear from the paper's methods but they sequenced the library with 75PE reads.

single-cell SC Seurat • 2.2k views

ADD COMMENT • link updated 4.4 years ago by Biostar 20 • written 4.4 years ago by A. Domingues ★ 2.7k

1

Entering edit mode

Any objections

Yes, sctransform fits its model on the UMI raw counts, not on TPM. You probably cannot do what you plan to do. If it is published can't you download the raw data and then process it? Yes, that is cumbersome but trying to tweak TPM and raw counts into one analysis is imho not only inappropriate but also a waste of time since results will not at all be reliable even if you technically get any results out of it. Alternatively, email the authors and ask for a raw count matrix. If you have that you could integrate them, but integration requires that at least some populations are being shared between studies so anchors (or whatever method you use) can be found. Random integration (like two completely different populations from different studies) is probably not going to be reliable.

ADD REPLY • link 4.4 years ago by ATpoint 87k

0

Entering edit mode

Cheers @ATpoint. This is what I feared. Cheers. I will have to go back to the drawing board.

ADD REPLY • link 4.4 years ago by A. Domingues ★ 2.7k

0

Entering edit mode

Just to add another note after doing some more research, Seurat doesn't recommend using SCTransform values for differential expression. So the sctransform is not even necessary for this.

ADD REPLY • link 4.4 years ago by A. Domingues ★ 2.7k

1

Entering edit mode

Yes, that is true. The reason you run SCtransform in the integration context is to select features, it was not clear to me whether you want to integrate or not. DE would be typically done on the raw counts which then are being run through appropriate frameworks such as edgeR, but the problem with the batch effect stands.

ADD REPLY • link 4.4 years ago by ATpoint 87k