Question

How To Find The Occurrence Of A Set Of Given K-Mer In A List Dna Sequences

0

Entering edit mode

11.4 years ago

wjlgatech • 0

I want to scan a list of DNA sequences against a list of given k-mers; each element on the k-mer list is a set of similar k-mers of equal length, they look like

myKmer1=c("TATGGGTTT", "TAAGGGTTT", ...,"CAAGGGTTT")
...
myKmer10=c("GGATTCCAG","CCATTCTTT",..., "CGATTCCTT")

What software/ R-script are available to attain the occurrences of list of k-mers on each sequence--the outcome should be a table looks like:

k-mers occurrence table1: showing the counts of k-mer in the sequences

             myKmer1  myKmer2  ...myKmer10
seq1        2             0                   3
seq2        1             3                   0
...
seq1000   0             1                   0

k-mers occurrence table2: showing the location of k-mer in the sequences

             myKmer1  myKmer2  ...myKmer10
seq1       111, 888   0                 123,456,3333
seq2       123          111,223,333  0
...
seq1000   0             1234            0

sequence • 4.9k views

ADD COMMENT • link updated 11.4 years ago by Josh Herr 5.8k • written 11.4 years ago by wjlgatech • 0

score 2 · Answer 1 · 2013-08-25

I'd do that with DSK -

it counts the k-mers in your reads converted to FASTA and writes the counts to a binary file. In the DSK archive there's a Python-script called parse_results.py which prints the counts for each k-mer, I think it shouldn't be too hard to modify that script to involve reads, as well.

score 1 · Answer 2 · 2013-08-26

1

Entering edit mode

11.4 years ago

Josh Herr 5.8k

To second Philipp, DSK is a good option. I would also try khmer and kmer-genie -- I guess choice of tool depends on the source of your sequences and what the next step of your analysis is.

ADD COMMENT • link 11.4 years ago by Josh Herr 5.8k