Hi is there any way to count the frequency of kmers from fasta files for different values of k? My objective is to produce a plot of the kmer distribution. Thank you so much.
Hi is there any way to count the frequency of kmers from fasta files for different values of k? My objective is to produce a plot of the kmer distribution. Thank you so much.
This one is fast and has lots of other functions in addition to k-mer counting:
From BBMap suite.
$ kmercountexact.sh
Written by Brian Bushnell
Last modified November 29, 2018
Description: Counts the number of unique kmers in a file.
Generates a kmer frequency histogram and genome size estimate (in peaks output),
and prints a file containing all kmers and their counts.
Supports K=1 to infinity, though not all values are allowed.
SEE ALSO: bbnorm.sh/khist.sh loglog.sh, and kmercountmulti.sh.
Usage: kmercountexact.sh in=<file> khist=<file> peaks=<file>
Input may be fasta or fastq, compressed or uncompressed.
Output may be stdout or a file. out, khist, and peaks are optional.
or
$ kmercountmulti.sh
Written by Brian Bushnell
Last modified December 30, 2016
Description: Estimates cardinality of unique kmers in sequence data.
Processes multiple kmer lengths simultaneously to produce a histogram.
Usage: kmercountmulti.sh in=<file> sweep=<20,100,8> out=<histogram output>
@Genomax I know this is not the proper post to ask this so I can delete it if needed but I am having this error when I tried to run kmercountermulti.sh on my linux server
/bbmap//calcmem.sh: line 75: [: -v: unary operator expected
java -ea -Xmx500m -cp /home/bbmap/current/ jgi.KmerCountMulti in=/home/eColi_LR_SL.fasta sweep=16,32,64 out=hist1_eColi_LR
Executing jgi.KmerCountMulti [in=/home/eColi_LR_SL.fasta, sweep=16,32,64, out=hist1_eColi_LR]
Exception in thread "main" java.lang.ArrayStoreException: cardinality.LogLog2
at cardinality.MultiLogLog.<init>(MultiLogLog.java:30)
at cardinality.MultiLogLog.<init>(MultiLogLog.java:11)
at jgi.KmerCountMulti.<init>(KmerCountMulti.java:149)
at jgi.KmerCountMulti.main(KmerCountMulti.java:49)
My java version is :
openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-b10)
OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)
Just tried with E. coli genome.
$ kmercountmulti.sh in=GCF_000005845.2_ASM584v2_genomic.fna sweep=20,100,8
Input is being processed as unpaired
#K Count
20 4645838
28 4483466
36 4645838
44 4577093
52 4541564
60 4365048
66 4597298
75 4669509
84 4475877
90 4626988
100 4387290
Time: 1.603 seconds.
Reads Processed: 1 0.00k reads/sec
Bases Processed: 4641k 2.90m bases/sec
No I do not have any extra comma there:
./kmercountmulti.sh in=/home/eColi_LR_SL.fasta sweep=16,64,16 out=hist1_eColi_LR
This is the command I am using and getting the error for. This is a fasta file for long reads of e Coli that I have simulated. I am wondering if the error if because of the input then.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Finding Kmers From Fasta File
Counting k-mers in a multifasta file
specific k-mer count in multi fasta file
It is three times the same tools being suggested though :)
@ATpoint Thank you for these. I had seen these. For some reason I am failing to install Jellyfish on my linux server. it is saying command not found whenever I am giving jellyfish command. kmer-counter is working but it is pretty slow. For ecoli short reads it took more than 6 hours and the server where I run the program can run for 6 hours at a stretch so the job got killed before producing any result. That's why I was looking for some other tool or script maybe awk or anythig.