Differential Expression Two References One Condition
1
1
Entering edit mode
9.5 years ago
MiguelMorard ▴ 20

Hello,

I have two new Saccharomyces cerevisae genomes assembly and I would like to rerun some RNA-seq data,using these new assemblies. I ran tophat-bowtie-cufflinks and have the FPKM for each gene in the strains. My porblem is that, as I used two different reference genomes, I don't think I can run cuffdiff to analyse differential expression. Googling a bit I found people working with two species, but all have, at least, two conditions for each species... and I only have one.

My question is : can I compare directly the FPKM? Is there any R package that I could use to manage FPKM directly or any normalization of it (log2 for example...)?

Thanks in advance for your suggestions.

RNA-seq R • 2.3k views
ADD COMMENT
0
Entering edit mode

So, just to clarify, you are trying to do differential expression of two different species?

ADD REPLY
0
Entering edit mode

Yes. (I already know which are the orthologs)

ADD REPLY
2
Entering edit mode
9.5 years ago

tldr: No, you can't directly compare FPKMs.

Trying to do differential expression between species is riddled with difficulties. I would strongly dissuade anyone from attempting such a comparison unless they are very very familiar with analysing RNAseq data and have thought long and hard about all the biases/batch effects that need to be compensated for. As a point of comparison, not doing this is what invalidated some of the mouseEncode paper (specifically, their false claim that samples cluster by species rather than by tissue).

Below is an incomplete list of things that need to be dealt with in such an analysis:

  1. Only orthologs can be compared.
  2. GC differences between orthologs make simple direct comparisons improper.
  3. Differences in transcript/gene/UTR length need to be accounted for.
  4. Are extraction efficiencies and biases the same between the species?

I'm sure I can think of other issues that would need to be dealt with if I thought about this a bit longer. Ensuring that results aren't biases by anything above (or the many things I likely didn't list) is going to be very difficult.

Having written all of that, since you're at least dealing only with yeast you have a good shot at actually compensating for everything properly. Have a look at the sequence similarity and such. I suspect that if the various metrics are really close then you might be OK (though you'd need to demonstrate that in any publication).

ADD COMMENT
0
Entering edit mode

Thanks Devon for your answer. Yes, I know that kind of issues could be a problem, but my species are really close, and I will begin working on strains of the same species. So differences of length, GC etc shouldn't be much. I'll have a look on all that though. In the case they seem close enough, what kind of programs/R-Bioconductor would you recommend to use to compare the data?

ADD REPLY
0
Entering edit mode

DESeq2, edgeR, or limma/voom. Don't use FPKMs for statistics.

ADD REPLY
0
Entering edit mode

So I should use counts? or z-scores or something like that?

ADD REPLY

Login before adding your answer.

Traffic: 2995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6