I want to train an SVM to recognize piRNA-clusters as opposed to clusters of miRNA, tRNA, mRNA.
I have found a few thousand clusters which are almost certainly piRNA clusters. I want to represent those clusters as feature vectors, to enable an SVM to find other likely candidates for piRNA clusters. However, the only features I can think of including in such a vector is: the length of the cluster (nb. of basepairs) and the number of reads in the clusters.
Are there other properties I should consider adding?
Will have to look those up. Thanks!