Hi everyone,
I have RNAseq data from a group of disease samples and a group of healthy (control) samples. I would like to look for isoform switching events that may occur between disease and healthy groups.
However, the QC results showed distribution of relative fragment abundance is biased (enriched) to 3' end of mRNA strands in disease group, but not in healthy group (this is due to batch difference), indicating that mRNA samples belonged to disease group were partially degraded at 5' end.
I tried IsoformSwitchAnalyzeR package to detect isoform switching events, with isoform quantification obtained from kallisto. The results show total ~25 events (with consequences), and most of them showed tendency of higher usage of shorter isoform in disease group. The fact that shorter isoforms were shortened at 5' end (beginning of transcript) suggest that the degradation pattern described above probably affects the isoform quantification and isoform switching detection. (Please refer to the example of GAPDH gene).
Is there any method that could help alleviate this bias in relative fragment abundance? I read about Alpine package (https://bioconductor.org/packages/release/bioc/html/alpine.html), but I am not sure how to apply it into Kallisto/Salmon isoform quantification results since it seems to return a normalized FPKM matrix after modeling. Have anyone applied this correction method for similar case?
Thank you for suggestion!
Are all disease one group and all controls the other (so the worst possible experimental design)?
Unfortunately, yes. Sometimes like rare disease, you have no control from the same cohort.
I see, yes that is of course a problem in such a scenario, but I fear then you cannot meaningfully compare the samples. All differences you see could be purely technical.