Question

Resources For Extracting Information From A Sequence

6

Entering edit mode

14.6 years ago

Will 4.6k

I'm running a competition at Kaggle.com on HIV-1 Progression ... check it out if you're interested, there's a 500 USD prize in it for the winner! There have been a number of machine-learning researchers with no biology background looking for a resource which can extract information about a NT sequence (or batch of sequences) that they can use as "feature-sets" for their machine-learning algorithms.

So far I've suggested k-mers, multiple-alignments, and known resistance mutations. I've even provided code for finding the count of all k-mers in a sequence. Does anyone have any other suggestions ... especially if they have tools that can do the feature-extraction.

Thanks a bunch,
Will

sequence-prediction • 2.7k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 14.6 years ago by Will 4.6k

1

Entering edit mode

Interesting competition.

ADD REPLY • link 14.6 years ago by Istvan Albert 101k

0

Entering edit mode

link to the competition: http://kaggle.com/hivprogression

ADD REPLY • link 14.6 years ago by Giovanni M Dall'Olio 28k

score 5 · Answer 1 · 2010-04-29

5

Entering edit mode

14.6 years ago

Simon Cockell 7.4k

You can do a lot of feature extraction with EMBOSS tools. From GC-content to finding palindromic sequences. Plenty of these tools could be used to build feature sets.

ADD COMMENT • link 14.6 years ago by Simon Cockell 7.4k