Can you elaborate on "didn't work"? Jellyfish was the first thing that sprang to mind reading your title, so I would say its probably worth persisting with since its one of if not the best tools for kmer stuff.
I am not entirely sure if Jellyfish can be readily used to carry out such comparative analysis. You may have to generate k-mer profiles for each sample/genome and then carry out comparisons separately.
You're right that this tutorial is out of date. The --matrix option is no longer valid as an option to jellyfish count. However, I don't think it's original intent was to do what you wanted anyway. It doesn't write out a binary presence/absence matrix. Rather, it specifies the binary matrix that is used to generate the universal hash function for hashing the k-mers. Jellyfish relies on a universal hash function, which can be generated using a random binary matrix. If you want to use the exact same hash function for other purposes, you need to know what that matrix is.
Anyway, to achieve what you want, I'm afraid you'll need to take a different approach. Essentially, what you want to do is to count k-mers in a collection of different fasta files / genomes, and then determine which k-mers are present in each. With jellyfish, you could do this by running jellyfish separately on each input genome, then using the dump command to get the k-mer list for each in plain text, and then merging across the files to get the matrix. Alternatively you could use a tool like mantis (disclosure; I'm a senior author of this method) or metagraph that are designed explicitly to be able to answer k-mer presence/absence queries over a large collection of k-mers coming from different sources (among other things).
The kmer-counter repo contains a script to demonstrate Python integration for quick filtering/querying. You could easily write out a presence/absence matrix from this result.
For kmers that are 32 characters and longer, a tool like Jellyfish would be appropriate.
Can you elaborate on "didn't work"? Jellyfish was the first thing that sprang to mind reading your title, so I would say its probably worth persisting with since its one of if not the best tools for kmer stuff.
Hi Joe, thanks for your answer. Sorry I didn´t explain my problem in the first message.
I installed jellyfish 2.3.0 and ran the command:
jellyfish count -m 256 -o jellyoutput -c 1 -s 100000000 -t 32 --matrix file.fasta
This was the error: count: unrecognized option '--matrix' Use --usage or --help for some help
The tutorial probably corresponds to an old version of the program. Do you know what is the correct command to generate a matrix like the one I need?
I am not entirely sure if Jellyfish can be readily used to carry out such comparative analysis. You may have to generate k-mer profiles for each sample/genome and then carry out comparisons separately.
Thanks for the answers!