I have a difficult experiment and I need some advice on how to proceed.
I have a data set of ~400 human rnaseq samples from different risk groups and as subgroup various abnormalities. I'm interested in finding out whether there is a significant difference in the expression of two specific transcripts of the same gene between two conditions. What I mean is, I would like to know if the difference of expression of TX1 between condition1 and condition2 is significantly higher/lower than the expression differences of TX2 between the same two conditions.
I don't know how to run this analysis!
What I have tried so far was as follow - I have done DESeq2
, limma
-voom
and DEXSeq
, but all they give me is the significance of a specific gene between two conditions. Even if I run it on transcript level, I can only find out if TX1 is significantly expressed between condition1 and condition2.
I have already asked a few questions (e.g. here, or here) about this experiment previously, but they didn't realy helped me much. I think my problematic was not understandable, so i hope i have made it clearer here.
for completeness, I have also run Kallisto
(with bootstrapping) & sleuth
and Salmon
on the complete data set. I have now a list of the counts from Kallisto
on transcript level and on gene level (imported by tximport
) and the TPM values calculated by Kallisto and by Salmon. But I still can only compare one transcript between two conditions.
I can't seem to find a way to analyse the data in the way I need it and get the results for a comparison of two transcripts over two conditions.
I would your advice on how to analyse the data. Is there a statistical robust way to analyse this kind of data?
thanks
Assa
The other approach would be the cufflinks approach - Calculate the Jansen-Shannon entropy of the transcript divergence between the transcript distributions.
I know cufflinks. what does this entropy value gives me?
The mixture of different isoform fractions for a gene form a distribution. If you have two different conditions you can calculate how different these distributions are using JS divergence. Thus if one isofrom is dominant in one condition, and a different isoform dominant in a different condition, then there will be a large JS divergence.
AFAIK there hasn't been a lot of benchmarking of this approach, but if its anything like cufflinks differential calling, I might be a bit wary of it. But you could possibly reimplement it to use TPMs or counts in R, I don't remember it being particularly difficult to calculate.
Plot two box plots of
You don't need to normalise the data here as you are looking for FC with in the same sample.
to get an idea if the mean FC change at all across different condition ?
Basically what you want is a model which asks if the fold change in one transcript is predicted by the other.
or you want to test the contrast (Trans1CondA- Trans1CondB) - (Trans2CondA-Trans2CondB) =/= 0
Have a look at DEXSeq, it might help you with what you want - it basically asks if the fold change in one exon of a gene is predicted by the fold change of other exons for the gene.
Hi, thanks for the help. I have had a look already at DEXSeq. But as far as I can say, it doesn't really help me. It shows me if an exon ( and for that this are DEXSeq specific exons and can be only part of a biological exon AFAIK) is changed between two conditions, but not its relationship to a different transcript in the same condition or the changes happening between two comparisons as I need here. Am I missing something in DEXSeq?
When DEXSeq calls an exon as differential, it is not saying that that exon is different between conditions, it's saying that its different relative to the other exons in the gene. Thus if the whole gene goes up or down, none of the exons will be significant, however, if only a single exon does, while the rest of the exons in the gene doen't change, or, conversely if all an exon stays the same, while the rest of the exons in a gene change, then that exon will be called significant.
You might be able to put counts for transcripts instead of counts for exons into DEXSeq to get what you want. But i'd check with the authors on support.bioconductor.org first.
Is there a way to apply this kind of contrast matrix to either DESeq2 or limma? I have not yet found a way to add the transcript to this kind of analysis :-(
There is certainly no mechanism intended to allow you to do this.
Have you heard of RATS? It might be worth taking a look:
https://github.com/bartongroup/RATS