Representing Short Read Clusters As Feature Vectors- Which Properties To Include?
2
0
Entering edit mode
12.0 years ago

I want to train an SVM to recognize piRNA-clusters as opposed to clusters of miRNA, tRNA, mRNA.

I have found a few thousand clusters which are almost certainly piRNA clusters. I want to represent those clusters as feature vectors, to enable an SVM to find other likely candidates for piRNA clusters. However, the only features I can think of including in such a vector is: the length of the cluster (nb. of basepairs) and the number of reads in the clusters.

Are there other properties I should consider adding?

short read • 2.2k views
ADD COMMENT
2
Entering edit mode
12.0 years ago
Pavel Senin ★ 1.9k

maybe binucleotide, trinucleotide frequencies? (k-mers frequencies) and their ordering?

ADD COMMENT
0
Entering edit mode

Will have to look those up. Thanks!

ADD REPLY
1
Entering edit mode
12.0 years ago
JC 13k

Time ago I wrote some complexity and composition function in Perl, check: https://github.com/caballero/SeqComplex

ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6