Question

Tool:Jellyfish - Fast, Parallel K-Mer Counting For Dna

1

Entering edit mode

13.2 years ago

Istvan Albert 102k

JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.

http://www.cbcb.umd.edu/software/jellyfish/

JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command.

JELLYFISH runs on 64-bit Intel-compatible processors running Linux or FreeBSD (including Intel Macs). It requires GNU GCC to compile.

jellyfish • 7.4k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 13.2 years ago by Istvan Albert 102k

0

Entering edit mode

Hi Istvan,

Im trying to use JellyFish to calculater 4-mer usage in my assembled contigs to get a table of tetranucleotide freq. Im using command line of : jellyfish count -o aftercut.cotig -m 4 -t 2 -s 100000 --both-strands Contigs.fasta

but all I got is a list of warning of 'Warn: Bad character in sequence', I ve double checked my fasta it seems fine. Do you know how to fix this? Or may be JellyFish is not suit for the calculation of tetranucleotide frequency. Looking forwards to your reply.

Kylie

ADD REPLY • link 13.0 years ago by Shuixia100 ▴ 120

0

Entering edit mode

please ask this as new question not as a comment to post - find the link named New Post at the top right of the page.

ADD REPLY • link 13.0 years ago by Istvan Albert 102k