What is the cutoff used for define high or low expression level of gene for survival analysis
1
0
Entering edit mode
6.9 years ago
mygamefun3 • 0

Hi everyone

In RNA-seq analysis, we need to separate samples into two groups for survival analysis. How can I define high level or low level for a gene according to counts or FPKM. Use median? average or quantile?

In TCGA or Oncomine, how are they define the cutoff for a gene ?

Thanks.

RNA-Seq • 8.2k views
ADD COMMENT
2
Entering edit mode
6.9 years ago

There's no definitive answer to your question. I would not advise going by the median or average. Quantile is a reasonable idea, or tertiles, with the higher third being regarded as "high expression".

An even better idea would be to convert your data to the Z scale, i.e., standard deviations from the mean, and then choose absolute 3, 4, 5, or 6 (3, 4, 5, or 6 standard deviations from the mean) as potential cut-offs. I trust that you have QC'd your data already and that low count transcripts have been removed.

ADD COMMENT
0
Entering edit mode

Thanks for your advice.

ADD REPLY
0
Entering edit mode

Hi Kevin,

I would like to look at survival of a specific gene between high and low expression tumor samples. For this, I want to divide tumor samples into high and low. I have counts data and transformed them into z-score.

Do you think the below one is right way to divide the samples?

event_rna <- t(apply(z_rna, 1, function(x) ifelse(x > 1.96,1,ifelse(x < -1.96,2,0))))

Or Should I follow this high and low samples separation

In that they used FPKM and took the median and based on that they separated samples into high and low.

Which one is the right solution?

ADD REPLY
1
Entering edit mode

Hey bro, There is no right or wrong. Your function looks okay!

ADD REPLY
0
Entering edit mode

Thank you very much Kevin.

ADD REPLY
0
Entering edit mode

Dear kevin, i have used median cutoff via the code below, but for quantile how can i change the code?

median_value = median(clin_df$gene_value)
clin_df$gene = ifelse(clin_df$gene_value >= median_value, "High expression", "Low expression")

it did work with quantile(clin_df$gene_value), but with warnings and i don't think this is the right way, and adding "Mid" is also face with error. thank you kindly.

ADD REPLY

Login before adding your answer.

Traffic: 1831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6