Separating gene expression data from human tumor and model organism in RNA-seq analysis
1
0
Entering edit mode
2.9 years ago
Aaron ▴ 30

I'm analyzing RNA-seq data from human GBM tumor samples implanted into mice (a PDX model). In Seurat, I've managed to get differential gene expression data, but I was having trouble finding the expression levels of some canonical GBM genes (e.g. CD155). It occurred to me that the data I have might only include mouse genes - does Seurat know how to handle genes/expression levels from two different organisms' tissues? Would it have filtered the human genes out? What could be the reason for me seemingly not being to find gene expression data from the GBM tumor?

EDIT:

I just remembered that in the initial Fastq analysis, I had to supply a reference genome for my data, and I supplied a mouse genome, but no human genome - does it seem likely that this is why I can't find gene expression data from the human GBM tumor? If so, I didn't realize you were able to supply two genomes when generating the reference genome - how would I supply the human genome as well as the mouse genome?

PDX-models Seurat • 1.0k views
ADD COMMENT
2
Entering edit mode
2.9 years ago

If this is 10X they supply a barnyard genome index on their website that contains both the mouse and human genomes concatenated. This can be used as input to cellranger. If you are using a custom reference or are not using cellranger you can concatenate the genomes yourself and build a custom genome index.

When processing the data downstream in Seurat/Bioconductor/Scanpy I'll classify cells to a particular organism based on the percentage of reads mapped to each organism. I'll then split the objects by species and remove the gene counts for the opposite species.

If you are using Seurat make sure to use the gene_id column to avoid gene name conflicts between the two species. You can convert them to gene_name once you've split the objects.

ADD COMMENT
0
Entering edit mode

Thank you rpolicastro this is incredibly helpful! I will look at the barnyard genome index on the 10X website and also use the gene_id and gene_name columns as you've suggested.

ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6