Entering edit mode
6.3 years ago
Biologist
▴
290
Hi,
I have downloaded TCGA data for lung cancer. It is counts data. I want to make survival plot for a specific gene between high expression and low expression samples. I want to make both Overall survival and Disease free survival plots. It should look something like this Kaplan-Meier survival analysis of OS (P < 0.001, log-rank) and DFS (P < 0.001, log-rank) rate in 144 patients based on the expression level of SNHG20 expression
How to divide the samples into high and low expression samples based on expression of single gene? I have counts data. Is there any cutoff for that?
You could plot the data and see how it spreads and use this do define the cutoff (e.g. if you notice there is clear separation into 2 groups). Another approach might be to use z-score and take +-1-2 score as high and low..
Based on z-score, so +1 and +2 are high, -1 and -2 are low? And for this survival do I also need to consider normal samples? or only tumor samples?
Yes, you can try a cutoff on 1 or 2 and see how it looks like ( for a two sided distribution the critical Z score values when using a 95% confidence level are -1.96 and +1.96). This task could be easily accomplished using cbioportal.org
I would plot cancer and normal on different charts (not sure there is point in showing a survival chart for normal individuals).
Ok. Thankyou. then I will convert counts data to logCPM and then to Z-score. From that samples with values less than -1.96 are low and samples with more than +1.96 are high and then make a survival plot for that.