Is there any composition program available that can classify a given protein sequence to a COG category without the need for alignment?
Is there any composition program available that can classify a given protein sequence to a COG category without the need for alignment?
As sequence data are so readily obtainable and available and as alignment methods are so robust, this is indeed the preferred mechanism for COG category assignment. I know of no work to assign a protein to a COG without using an alignment of sequence data.
That said, it is possible to ignore or not use sequence alignments, but it is just not routine.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I too am looking out for such a program. Not for COG finding though. Did someone try to cluster protein sequences using compositional features ?
re Leena In short yes. See i.e. the paper http://www.pnas.org/content/105/40/15352.short
What's a COG category? Can you elaborate on the problem?
It's unlikely such a program could exist. Something as simple as amino acid composition would not be able to accurately discriminate millions of proteins into a few thousand categories.
It's an old question, so you are possibly not looking anymore, but why would you need an alignment free algorithm?