Question

Best way to create a custom reference for CellRanger when using a transgenic mouse strain expressing a human protein.

0

Entering edit mode

11 months ago

Alx • 0

Hello all,

In the coming months I will perform a single cell immune profiling (10X) experiment using a transgenic mouse strain (C57BL/6 background). The producer of this strain states that exons 2-7 of the mouse gene that encode the extracellular domain were replaced by the human exons 2-7. Naturally, in the protein level we have examined via flow cytometry that only the human version of the protein is detected and not the murine one.

I have read all the CellRanger documentations about making custom references and have a general idea on how to proceed.

Should I:

1) Add the full human nucleotide sequence encoding the whole human protein as an extra line in the fasta and gtf files (similar to how they do with GFP in their example)

2) Add only the specified exons (?)

3) Find the murine sequence in the Ensembl reference genome that is used by default, and manually change the exons to the human ones, i.e not adding any "extra" genes.

Option 1 is the most straight-forward, and if I already had the data I would probably try all of the above and see what looks better.

However, I would appreciate your input as to what is more appropriate and to save time when I will have to do it.

Thank you in advance!

cellranger • 1.4k views

ADD COMMENT • link updated 11 months ago by ATpoint 88k • written 11 months ago by Alx • 0

0

Entering edit mode

Why do you want to see the transcript expression of a transgene? Wouldn't it make more sense (if at all) to assay this on protein level using CITE-seq?

ADD REPLY • link 11 months ago by ATpoint 88k

0

Entering edit mode

This transgene is very important for the treatment we are giving to the mice, as it encodes for the entry receptor for our fully human antibody. In the protein level we know that it is upregulated after treatment etc.

While the goal of this scRNA-seq experiment is not dependent on this transcript's expression, I just want to find a way to not get 0 transcripts identified due to poor mapping if I use the wrong reference.

But now that I think of it, since it still has some murine sequence, would that be enough to get confidently mapped reads to the original reference genome, without making a custom one?

ADD REPLY • link 11 months ago by Alx • 0

score 4 · Accepted Answer · 2024-06-12

4

Entering edit mode

11 months ago

dsull ★ 7.5k

I see. I'd still make a custom one. Human and mouse have decent homology but if you map human reads to a mouse genome, you always get poorer alignment.

FYI, However, option 3 will likely not be feasible if you're editing the reference genome FASTA directly if the mouse exons are of different length than the human exons (since then the chromosome coordinates will all get screwed up).

In which case, what you can do is remove the entire mouse gene from the original GTF (since you know it doesn't exist) and then add in [mouse_exon_1][intron][human_exon_2][intron][human_exon_3]...etc. as a new line in your FASTA and then annotate the exon starts/ends in your GTF accordingly.

This will probably give you the most precise mapping since it's actually what's in your mouse.

ADD COMMENT • link 11 months ago by dsull ★ 7.5k

2

Entering edit mode

Agree with this suggestion to remove the mouse gene and replace it by the actual transgene. Anyway, since you cannot know upfront whether (or to what extend) this will work, I would definitely include a CITE-seq towards the protein, to be sure that you get abundance information. Depending on your setup you could even add additional antibodies to phenotype your cells based on established FACS marker combinations. We routinely do this for certain immune cell populations that are hard to distinguish on transcript level, but well on surface protein level.

ADD REPLY • link 11 months ago by ATpoint 88k

0

Entering edit mode

Thank you for your insight! This is probably what I will try. Don't know why I didn't think of that, rather thought to include the full human sequence...

Anyway, I don't know how to accept this reply as an answer (can't see something on mobile), but this should be it.