CPM and FPKM for machine learning
0
0
Entering edit mode
3.5 years ago
bioinfo ▴ 60

Hi I have training RNA-seq dataset consisting log2CPM values to develop a prediction model. But I have independent test dataset with FPKM values since it does not have raw counts. Since both the training and test dataset do not contain same format, do I need to convert log2CPM to FPKM? Can I test my prediction model with test dataset of different format? Please suggest me how to convert log2CPM to FPKM.

RNA-seq • 1.0k views
ADD COMMENT
0
Entering edit mode

Already answered here

ADD REPLY
0
Entering edit mode

Not really answered completely there. The CPM to FPKM calculation is easy to lookup, and if you can develop a prediction model with machine learning, I'm sure you can figure out how to convert between two sets of numbers with different units. The larger question is whether conversion is necessary. Given that the difference between CPM and FPKM is gene length, and gene length can and will differ between almost all genes you measure, this means you've trained on a data set that doesn't take gene length into account, whereas in your test data set every gene you've measured will have some constant value difference from every other gene in your data set. So it would be like testing your model after first multiplying all your inputs by random integers (the gene lengths). Your test and training data should have the same dimensional units. Since FPKM incorporates an extra dimension not found in CPM, it wouldn't make any sense to test a CPM-based model with FPKM values.

ADD REPLY

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6