Update. problem solved. Let me re-instate my question. I have tumor-normal paired RNA-seq data from multiple patients. However, for each tissue, I got no replicate. I'd like to perform tumor-normal for each individual and have some credibility indication, since I have multiple healthy tissue samples. I turned to Dr. Smyth, the author of EdgeR package. It turned out the my application can be performed.
The preparation work has been discribed in the UserGuide Section 4.1 where a overall comparison was made between the tumor and healthy samples from 3 patients. The BCV generated from the population would be applied. So after the global BCV's calculated, one could simply build a new design matrix by
design.patientspecific <- model.matrix(~Patient+Patient:Tissue)
Later on the fit and comparison can be made by
fit.patientspecific <- glmFit(y, design.patientspecific)
lrt.patient1 <- glmLRT(fit.patientspecific, coef=4)
lrt.patient2 <- glmLRT(fit.patientspecific, coef=5)
lrt.patient3 <- glmLRT(fit.patientspecific, coef=6)
The original question:
Hi dear folks. I apologize if I asked a repeated question. I have RNA-seq data from individuals that were acquired from paired tumor/adjacent tissues. I thought the DGE analysis should be straight-forward but it's not. Almost all pipelines I learned about require a biological repeat which I don't have. Of course I have data from other individuals but my focus is not on cross-comparing. I'm interested in the DGE for one individual only and that's to validate the MSMS data acquired from the same person.
I found only a few publications with similar applications. I wonder if you can recommend a pipeline for data analysis. I'm not sure, at this point, if I have enough data for a student T test and a p value.
Thank you so much. Field
Update, I wonder if I can take advantage of the data from multiple patients and apply the combined tagwise dispersion to one single patient. I will post this query on Bioconductor as well. If a clear solution's obtained, I will update this thread.
@ ATpoint @dariober
I'm not sure why the comments from you are now invisible. Thank you both for your kindly reply. It took me a while to familiarize edgeR. I went through part of the manual, particularly the 2.12 (no replicate) and the 4.1 (example on carcinoma patients) sections. I understand the parameters verbally but not always mathmatically.
Let me know if my thought is valid. I'd like to take advantage that I have samples from multiple patients, so I can estimate dispersions more accurately. The more samples pairs I throw in, the bettter tagwise dispersion I would get. I wonder if I can keep the tagwise table (don't see why I can't) and apply that to each patient's data.
Also I don't fully understand the mechanisms of qCML and Glm modes. Are they functional alternatives? Should I prefer one over another for my application?
Thank you so much. Field