Hello all,
In the coming months I will perform a single cell immune profiling (10X) experiment using a transgenic mouse strain (C57BL/6 background). The producer of this strain states that exons 2-7 of the mouse gene that encode the extracellular domain were replaced by the human exons 2-7. Naturally, in the protein level we have examined via flow cytometry that only the human version of the protein is detected and not the murine one.
I have read all the CellRanger documentations about making custom references and have a general idea on how to proceed.
Should I:
1) Add the full human nucleotide sequence encoding the whole human protein as an extra line in the fasta and gtf files (similar to how they do with GFP in their example)
2) Add only the specified exons (?)
3) Find the murine sequence in the Ensembl reference genome that is used by default, and manually change the exons to the human ones, i.e not adding any "extra" genes.
Option 1 is the most straight-forward, and if I already had the data I would probably try all of the above and see what looks better.
However, I would appreciate your input as to what is more appropriate and to save time when I will have to do it.
Thank you in advance!
Why do you want to see the transcript expression of a transgene? Wouldn't it make more sense (if at all) to assay this on protein level using CITE-seq?
This transgene is very important for the treatment we are giving to the mice, as it encodes for the entry receptor for our fully human antibody. In the protein level we know that it is upregulated after treatment etc.
While the goal of this scRNA-seq experiment is not dependent on this transcript's expression, I just want to find a way to not get 0 transcripts identified due to poor mapping if I use the wrong reference.
But now that I think of it, since it still has some murine sequence, would that be enough to get confidently mapped reads to the original reference genome, without making a custom one?