Resources For Extracting Information From A Sequence
1
6
Entering edit mode
14.7 years ago
Will 4.6k

I'm running a competition at Kaggle.com on HIV-1 Progression ... check it out if you're interested, there's a 500 USD prize in it for the winner! There have been a number of machine-learning researchers with no biology background looking for a resource which can extract information about a NT sequence (or batch of sequences) that they can use as "feature-sets" for their machine-learning algorithms.

So far I've suggested k-mers, multiple-alignments, and known resistance mutations. I've even provided code for finding the count of all k-mers in a sequence. Does anyone have any other suggestions ... especially if they have tools that can do the feature-extraction.

Thanks a bunch,
Will

sequence-prediction • 2.7k views
ADD COMMENT
1
Entering edit mode

Interesting competition.

ADD REPLY
0
Entering edit mode

link to the competition: http://kaggle.com/hivprogression

ADD REPLY
5
Entering edit mode
14.7 years ago

You can do a lot of feature extraction with EMBOSS tools. From GC-content to finding palindromic sequences. Plenty of these tools could be used to build feature sets.

ADD COMMENT

Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6