Dear all,
I am trying to find the lncRNA using the CPAT on dairy cows. To be able to determine the coding probability cutoff, I followed "How to choose cutoff" to generate the training dataset:
Here is how I did: Step 1: make_hexamer_tab.py -c /storage/users/xdai/ref/cow/Bos_taurus.ARS-UCD1.2.cds.all.fa -n /storage/users/xdai/ref/cow/Bos_taurus.ARS-UCD1.2.ncrna.fa > Bos_taurus_Hexamer.tsv
Step 2: make_logitModel.py -x Bos_taurus_Hexamer.tsv -c /storage/users/xdai/ref/cow/Bos_taurus.ARS-UCD1.2.cdna.all.fa.gz -n /storage/users/xdai/ref/cow/Bos_taurus.ARS-UCD1.2.ncrna.fa -o Bos_taurus
The cds file I download from Ensembl. The known coding protein-coding (cdna) and unknown protein-coding (ncrna), I all downloaded from Enseml. Based on the previous step, I generate the required train dataset with the heading of "names(data)[1]: "ID" "mRNA" "ORF" "Fickett" "Hexamer" "Label" (The same as shown on the website)
Then I used "10Fold_CrossValidation.r" that I download from the CPAT website to generate figure 3, to decide the cutoff coding potential value. In the step of "pred <- prediction(ROCR_data$predictions, ROCR_data$Labels)", it showed the following error:
Error in prediction(ROCR_data$predictions, ROCR_data$Labels): Number of classes is not equal to 2.ROCR currently supports only evaluation of binary classification tasks.
I open the generate "test1.xls" and found the labels all equal to "1". The original loaded data (trained dataset) has both "0" and "1". I did not change the code of "10Fold_CrossValidation.r". I have no idea what is going on. Could anyone please advise what is wrong with my steps and suggestions to fix this problem?
Many thanks.
Hi,
Have you found a solution for this problem: Error in stats::approxfun(x.values.1, y.values.1, method = "constant", : zero non-NA points