I'd like to identify all kmers in one set of transcriptomic data that are not in the other set. I am dealing with large amounts of data, but it seems to me that sequence assembly regularly performs this task, and that it could be easily accomplished with suffix trees. The k I am thinking of using is 32.
Are there any programs which can accomplish this for me? I'd rather not re-invent the wheel. I'd even settle for a program which can give me a list of k-mers present in a single data set.
Cheers!
This worked perfectly, and pretty fast as well. Thanks!