Question

Comparing gene expression profiles between species

2

Entering edit mode

10.3 years ago

Mikael Huss 4.8k

I am looking for input on ways to compare gene expression profiles between species. The scenario I am thinking of is:

I have a set of phenotypes in species A, and associated gene expression profiles.
Now I obtain a gene expression profile for species B.
Which phenotype in A looks most like the gene expression profile measured for species B?

For example, let's say we knock out a mouse gene and get a gene expression profile which is perturbed relative to the wild type. Which human disease looks most like this knock-out phenotype, based on gene expression?

I suppose this could be done on the gene level (by mapping genes to orthologs), on the pathway level (by assessing which pathways have been perturbed in each case), or on other levels of abstraction. Any success stories out there?

phenotype species gene-expression • 5.7k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Mikael Huss 4.8k

0

Entering edit mode

May be I can suggest you something, tell me what you have, microarray or RNA-seq?

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

RNA-seq in this case, but I don't exclude suggestions or previous work based on microarrays!

ADD REPLY • link 10.3 years ago by Mikael Huss 4.8k

0

Entering edit mode

I have written my answer, but I think that your question is hot and should be thoroughly discussed on this forum, I am looking forward to have more suggestions/improvements/comments/criticisms

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Manvendra Singh ★ 2.2k

Ram · Answer 1 · 2015-01-22

2

Entering edit mode

10.3 years ago

Manvendra Singh ★ 2.2k

This is what I would for RNA-seq

Map the reads on both genomes (human and mouse)
Take only those reads for further analysis which mapped on both genomes (biasness of insertions and deletions of sequences between the genomes are removed and moreover, you get orthologous regions from your reads)
Now count the reads over gene features and remove those genes which has low counts in all samples. (you would lose lot of them)
Assign mean of counts over different transcripts to their respective gene, transform it on log scale.
Now you have rownames as your genes colnames as your samples, now merge both species data into one dataframe
Normalize them by their quantiles or surrogate variances.
Calculate relative expression of each gene across the sample (assign the relative value to the rowmeans to each gene of each sample)
Calculate spearman's correlation between the samples, and see which of them are forming clusters.

For microarray data I already have done and it was published in this paper, just read the methods section where cross-species and cross-platform expression analysis is mentioned, its also there in this publication as well.

HTH

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Thanks, that is useful. I will need to try this and think about it and perhaps continue the discussion here.

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Mikael Huss 4.8k

0

Entering edit mode

Yes, I am very happy to be involved in discussion :)

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

What if you have differential expression profiles (treated vs untreated) for each of say, 9 species? Could this method extrapolate to this instance?

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 9.6 years ago by rleach ▴ 180

0

Entering edit mode

@Manvendra..what do you mean by 4) assign mean of counts over different transcripts to their respective gene, transform it on log scale?

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 9.5 years ago by ifudontmind_plzz ▴ 200

0

Entering edit mode

It's just mean of all transcripts to assign one value to gene

PS: Its quite old post, now this job is done by featureCount where we calculate counts on gene_id but not on transcript_id given in gtf files

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 8.7 years ago by Manvendra Singh ★ 2.2k