Generate kmer profiles from a bunch of peptide sequences
1
1
Entering edit mode
10.3 years ago
Owen S. ▴ 370

Can anyone recommend a software solution to do this:

  • Input: about 100,000 short peptide sequences -- unaligned -- of varying lengths, but mostly under 20 residues.
  • Output: amino-acid profiles (e.g. sequence logo map) describing similar over-represented kmers (say, 3-or 4- or 5-mers).

I can think of ways to tackle this myself*, but why re-invent the wheel? Hoping that my question and any discussion that follows may also help others.

Thanks!

PS. My approach would be something like this:

  1. count all unique kmers
  2. calculate pairwise distances
  3. select clusters (clades) of similar kmers
  4. use these kmers (and their counts) to build sequence logo maps
sequence hmm epitope • 3.0k views
ADD COMMENT
0
Entering edit mode
10.3 years ago

The Biostrings Bioconductor package has fast kmer counting (oligonucleotideFrequency) functionality. You can then take your results and do all kinds of stats, clustering, and visualization.

ADD COMMENT
0
Entering edit mode

Thanks, but my question relates to peptide, not nucleotide, sequences. (The Biostrings function you suggested only works with nucleotide seqs.)

ADD REPLY

Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6