Entering edit mode
8.1 years ago
murphy.charlesj
▴
100
I' am looking at some TCGA data and want to compare the expression profiles of tumors with a certain fusion (4 samples) vs those without it (~500 samples). I don't think current algorithms for differential expression will work with such unbalanced classes (e.g. DESeq2). It seems there may be some strategies I can take from the machine learning community (e.g. under sampling the larger class). Do you have any recommendations or can point me to any papers?
Thanks!
One way could be taking the mean of Expression values of all Samples(500 samples,without it), to get a single Mean Expression value per gene. And similarly take the mean of Expression values of 4 samples,and divide to get a log2foldchange for differential expression
My advisor suggested something similar, so I'll definitely give it a try. Thanks.