Question

Normalizing TCR data

0

Entering edit mode

5.8 years ago

ruchitkpanchal • 0

Hi,

What's the best method to normalize TCR repertoire data (for comparison between samples) which already has raw counts and count frequency. Already tried Counts per Million (CPM) but there's a possibility that it might exaggerate the count number of a clone so the real picture won't be evident. Count Proportion (in %) is essentially the same as CPM (per 100 instead of per million). Was wondering something similar to if not same as TPM (Transcripts per Million) but with T cell clonotypes. Any help appreciated.

Thank You

-RP

Analysis • 2.3k views

ADD COMMENT • link updated 22 months ago by Ming Tommy Tang ★ 4.5k • written 5.8 years ago by ruchitkpanchal • 0

1

Entering edit mode

What type of data did you obtain? Is this from single cells? Bulk? What do your counts represent - reads? Cells?

ADD REPLY • link 5.8 years ago by Friederike 9.0k

0

Entering edit mode

We have Bulk Sequencing data using SMARTer a/b tcr kit tool for library prep. Sequencing was done on MiSEQ. Counts represent reads of particular clonotype in CDR3 region.

ADD REPLY • link 5.8 years ago by ruchitkpanchal • 0

1

Entering edit mode

great! and the goal of your analysis is to see whether a certain clonotype is more abundant in one sample than the other?

Can you also elaborate on why you think CPMs won't do the job? It may help me understand the issue a bit better. :)

ADD REPLY • link 5.8 years ago by Friederike 9.0k

0

Entering edit mode

Yes we want to know the relative abundance between samples.

CPM rescales the count linearly. If a clone has a certain saturation after certain reads it won't be evident from CPM. So there's a possibility that CPM normalization exaggerates the count number. Basically we don't know how individual clonotypes expand with the read count. If it's highly non-linear then CPM won't work.

ADD REPLY • link 5.8 years ago by ruchitkpanchal • 0

0

Entering edit mode

What do you mean with "saturation after certain reads"? Do you mean that one drastically expanding clonotype will scavenge reads away from the other, rarer clonotypes? That's definitely a possibility. You could add the number of clonotypes per samples as a denominator

ADD REPLY • link 5.8 years ago by Friederike 9.0k

0

Entering edit mode

you can do downsampling and then calculate entropy or Gini-index diversity metrics.

ADD REPLY • link 22 months ago by Ming Tommy Tang ★ 4.5k

score 0 · Answer 1 · 2023-01-26

One of the most commonly accepted methods is to normalize the data using UMIs. If you use Takara SMARTer a/b tcr kit for human data with UMI, you can do that. Otherwise you can downsample to the same number of randomly selected reads, or to the top abundant clonotypes by weight (number of reads).

Also, its pretty easy to use MiXCR for takara kits, there are a specific commands available for every Takara kit, e.g.:

> mixcr analyze takara-human-tcr-V2-cdr3 \
>       input_R1.fastq.gz \
>       input_R2.fastq.gz \
>       result

Depending on the kit it handles UMIs and primer trimming.

Also, you can read on normalization here: https://docs.milaboratories.com/mixcr/reference/mixcr-downsample/