Question

How to binarize genomic data

0

Entering edit mode

5.2 years ago

mirko.hu • 0

I have a table of genomic data. The rows are the patients, the columns are the genes and the entries are the gene abundances. I need to binarize the data so that above a threshold the entries are substituted by 1's and below the threshold, the entries are substituted by 0's. Which is the best way to do this? How do I find the correct threshold?

I need to binarize the data to obtain the correspondent biadjacency matrix of an unweighted network. Thank you for your help.

genome binarization • 1.5k views

ADD COMMENT • link updated 5.0 years ago by venu 7.1k • written 5.2 years ago by mirko.hu • 0

score 0 · Answer 1 · 2020-05-10

0

Entering edit mode

5.2 years ago

Mensur Dlakic ★ 29k

One way to do it is with sklearn. Don't know if that's the best way.

ADD COMMENT • link 5.2 years ago by Mensur Dlakic ★ 29k

score 0 · Answer 2 · 2020-07-19

How do I find the correct threshold?

This can vary for each patient, depending on the heterogeneity in your cohort. IMO, it's hard to find one threshold for all samples/genes. What you can do is

add quartiles to all genes in each patient
Take 3rd and 4th quartiles as 1 and rest 0 for each patient

You can do it basic R/python skills.

P.S: Not sure what your goals are, the answer is just to let you know how to do it. How much sense it makes sense biologically is up to you.