Entering edit mode
4 months ago
Qiang
▴
10
Hi all,
I am new to the field of gene expression. I am currently working on testing different drugs on human liver cells.
After conducting RNA-seq analysis, I obtained three types of data: Reads count, FPKM, and log2 fold change (treated FPKM/untreated FPKM). I aim to build a machine learning classifier to establish relationships between gene names and the various drugs.
My question is, which of these data types—raw counts, FPKM, or log2 fold change—would be most suitable for building a machine learning model?
Thank you.
I wouldn't use raw counts since they haven't been normalized. You could potentially use either FPKM or log2 fold change. Why not make a separate model for each one and see which one does the best job of predicting the classes?
Thank you Jeremy! Yes. we will try to compare the results of ML model using FPKM or log2FC.