Feature Selection for DNA methylation data

0

Entering edit mode

5 weeks ago

Ahmed • 0

Hi, All I am building a supervised model on DNA methylation data for liver cancer. A general practice for feature selection is we remove the highly correlated columns. But, I am concerned that by doing this with methylation data , I can lose important features which may lead incorrect or biased prediction.

Cancer Feature-selection methylation • 343 views

ADD COMMENT • link 5 weeks ago by Ahmed • 0

1

Entering edit mode

You will need to provide a bit more info in terms of what you are doing and what specifically you want help with?

What platform is your data - WGBS / methylation array etc?

What structure is your data in - what are the rows / columns?

What are you correlating and why?

What is your model supposed to be used for?

ADD REPLY • link 5 weeks ago by yura.grabovska ▴ 670

0

Entering edit mode

I am training two models to identify top cg sites based on age and race using the methylation array data containing the methylation beta values via Illumina 450. The data used for feature selection has samples as row names and cg ( methylation values as columns )

I have applied the standard deviation to reduce from 450k + cg sites to 5000. 3.Model 1 will predict the cg sites which are involved in causing cancer in middle and older age groups
Model 2 will predict cg sites involved in cancer among in two races ( asian and white )
This will be used to identify the potential biomarkers.

ADD REPLY • link 5 weeks ago by Ahmed • 0

Login before adding your answer.