How to binarize genomic data
2
0
Entering edit mode
4.6 years ago
mirko.hu • 0

I have a table of genomic data. The rows are the patients, the columns are the genes and the entries are the gene abundances. I need to binarize the data so that above a threshold the entries are substituted by 1's and below the threshold, the entries are substituted by 0's. Which is the best way to do this? How do I find the correct threshold?

I need to binarize the data to obtain the correspondent biadjacency matrix of an unweighted network. Thank you for your help.

genome binarization • 1.1k views
ADD COMMENT
0
Entering edit mode
4.6 years ago
Mensur Dlakic ★ 28k

One way to do it is with sklearn. Don't know if that's the best way.

ADD COMMENT
0
Entering edit mode
4.4 years ago
venu 7.1k

How do I find the correct threshold?

This can vary for each patient, depending on the heterogeneity in your cohort. IMO, it's hard to find one threshold for all samples/genes. What you can do is

  • add quartiles to all genes in each patient
  • Take 3rd and 4th quartiles as 1 and rest 0 for each patient

You can do it basic R/python skills.

P.S: Not sure what your goals are, the answer is just to let you know how to do it. How much sense it makes sense biologically is up to you.

ADD COMMENT

Login before adding your answer.

Traffic: 2165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6