Question

Is it possible to calculate TPM using 10X Genomics public data?

0

Entering edit mode

2.5 years ago

Athena • 0

I'm wondering if using data from there (link below) i could find/calculate TPM https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.0.1/pbmc4k or is this data not sufficient enough to do that?

genomics Python R genome • 1.9k views

ADD COMMENT • link updated 2.5 years ago by benformatics 4.0k • written 2.5 years ago by Athena • 0

1

Entering edit mode

Why do you want TPM? TPM divides counts by transcript length but with UMI-tagged data, it isn't necessarily true that longer length -> more counts. I'd recommend not dividing by transcript length for 10X data.

ADD REPLY • link 2.5 years ago by dsull ★ 6.9k

0

Entering edit mode

Im trying to run a correlation test using my bulk data (either using RPKM/FPKM/TSM) and do some further downstream analysis.

What would be a better method then, if you do not recommend dividing transcript length?

ADD REPLY • link 2.5 years ago by Athena • 0

0

Entering edit mode

Just don't divide by transcript lengths. Just take a gene's UMI count and divide it by the total number of UMIs in a cell (this is essentially what TPM is except we're not dividing by transcript length).

ADD REPLY • link 2.5 years ago by dsull ★ 6.9k

0

Entering edit mode

Just use the raw counts. All of the above methods introduce a single linear scaling factor so correlation does not change regardless of the method -- unless you introduce something like a per-gene factor such as length -- which as pointed out makes no sense for 10X data.

ADD REPLY • link 2.5 years ago by ATpoint 85k

score 1 · Accepted Answer · 2022-05-30

1

Entering edit mode

2.5 years ago

benformatics 4.0k

I would use a standard single-cell method like the LogNormalize from the Seurat package on the counts. Length normalisation only matters for full-length transcriptome sequencing.

ADD COMMENT • link 2.5 years ago by benformatics 4.0k