Question

Survival analysis: Why to categorize age?

1

Entering edit mode

4.8 years ago

english.server ▴ 300

Hi there!

I wonder ed why scientists usually categorize age in survival analysis? For example, a 51-year-old patient is in the same category with a 60-year-old, but a 61-year old in a different category from the 60? (Supposing dividing the data in 10-years intervals). Why age is not considered continuous? A reference (textbook) will be appreciated as well.

survival • 1.5k views

ADD COMMENT • link updated 4.8 years ago by Kevin Blighe 88k • written 4.8 years ago by english.server ▴ 300

1

Entering edit mode

The only reason may be if the function response(age) is complex and can not be easily modeled. Otherwise it is a "mistake" to categorize anything since it leads to the loss of power.

Ok, another reason - clinicians want to have a clear simple cut off, so before age x risk is low, after - high

ADD REPLY • link 4.8 years ago by German.M.Demidov ★ 2.9k

2

Entering edit mode

This was my "feeling".

I also thought that if there is something mysterious leading statisticians/bioinformaticians to categorize data, one may use "sliding windows"

0-9 ; 10-19; .....

1-10 ; 11-20; ....

.

.

9-18 ; 19-28; .....

and then recheck the results.

ADD REPLY • link 4.8 years ago by english.server ▴ 300

3

Entering edit mode

If you want to know what statisticians think, you may read this thread https://stats.stackexchange.com/questions/16565/what-is-the-effect-of-dichotomising-variables . However, medical people think differently :)

ADD REPLY • link 4.8 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Useful link. Thanks a lot

ADD REPLY • link 4.8 years ago by english.server ▴ 300

3

Entering edit mode

Hey, don't forget me A: Why quantitative design are preferred GWAS approach

;)

ADD REPLY • link 4.8 years ago by Kevin Blighe 88k

1

Entering edit mode

Thank you. Valuable information.

ADD REPLY • link 4.8 years ago by english.server ▴ 300

score 1 · Answer 1 · 2020-01-30

1

Entering edit mode

4.8 years ago

Kevin Blighe 88k

It will depend on the disease under study - many are age-related / confounded by age, including cancer.

Regarding the specifics of the difference between 60 and 61, it's simply a cut-off / threshold. One could just as easily make the same argument about p<0.05 and p<0.051

In other disease areas, 55 may be a single cut-off for age, as it is regarded as the age at which menopause commences, but this obviously differs from individual to individual.

Kevin

ADD COMMENT • link 4.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you. It seems I still have to study more.

ADD REPLY • link 4.8 years ago by english.server ▴ 300

2

Entering edit mode

Not sure that you need to study more, as such... in certain diseases, there are just well established relationships between age and disease. This is therefore more related to medicine and epidemiology, as opposed to being about bioinformatics.

What German says is correct, too, i.e., that using strict cut-offs and creating categorical variables from continuous variables can result in lost power.

There are no standards in how to deal with this.

ADD REPLY • link 4.8 years ago by Kevin Blighe 88k