Hi everyone,
The problem I have isn't really a bioinformatic problem but I came across a hard nut to crack when analyzing my RNA-seq results. I had 3 conditions in triplicate: - 1 = control - 2= expressing 2 isoforms of a protein - 3= expressing 1 isoform of the protein
Seq generated 50 million 75 bp single end reads per sample.
The DE analysis I realized using cuffdiff gave me the following results: - 1 vs. 2 = 400 DEG - 1 vs. 3 = 50 DEG - 2 vs. 3 = 800 DEG
The same .BAM files analyzed using DESeq2 gave me more "marked" results: - 1 vs. 2 = 700 - 1 vs. 3 = 10 DEG - 2 vs. 3 = 2300 DEG
Here both analyzes show negligible DEG between condition 1 and 3. I know that DE analysis has the purpose to highlight significantly deregulated genes but to have so much DEG between 1 vs 2 and 2 vs 3 means there is something going on in my third condition right ? even tho there is not much DEG between control and third condition ? I don't know how to put this results into words.
Does anyone has encountered a similar situation ?
Thanks for the help you can provide !
Hi,
Is the isoform in your third condition the same as one of the isoforms in your second condition?
And regardless of my first question, why are two isoforms clubbed in your second condition?
1 vs. 2 = 400 DEG
might have been different if the isoforms were kept as two different conditions.Hi, in deed the isoform in the third condition is the same as the one in the second condition. Long story short, the two isoforms come from an alternative splicing. The isoform2 from the third condition was shown to have elusive effects in the litterature yet it accounts for 90% of mRNA from the gene. On the contrary, isoform1 an its effects are very well caracterized. e were unable to generate cellular clones expressing only the first unspliced isoform because inhibiting splicing increases by 10 fold the expression of the unspliced which is lethal for the cells unfortunately ...
Like you suggest the best experimental set up would be one condition for each isoform. But we had to chose this one so we have: - control - 10% isoform1 / 90% isoform2 (closest to in vivo) - isoform2 (our interest)
Hi,
I am not very much convinced by the method employed, but I am myself an amateur in the field. Although, looking at the number of DEs across your condition, I want to know what was the quality filter applied while selecting the reads (Phred Score), and are the number of reads obtained from all three sets similar?
Edit: Another option would have been to add a 4th condition - isoform 1
Hi, the average Phred score was 33. (40 million reads above 32). We obtained 50 million reads for each replicate (46 min to 57 max).
This fourth condition would have been nice indeed.
Hi,
The number of reads in the three sets are similar, with a good Phred score cut-off. I'm sorry, but I can't think of a conclusive reason for your results, although I would suggest you to repeat your analysis using another pipeline.
I'm starting to think that DEG analysis can't answer the question i'm asking.
Thanks you very much for your time !
Maybe not, but now that you have already invested time in it, you should try the analysis with another pipeline. You would at least know if there was an error in the pipeline, or in the analysis.
Can you give me any recommandation ? I'm fairly new to bioinformatics :)
Sure. You could try the hisat2 protocol (new tuxedo protocol), and also other aligners and mergers.
https://www.nature.com/articles/nprot.2016.095
https://www.nature.com/articles/nprot.2013.099#procedure
These are two established pipelines, and you could try analyzing the example data to be sure you understand what's going on.
Good luck :)
I will try these. Thanks again !