Entering edit mode
2.4 years ago
khq5801
▴
10
Hi
I would like to remove miRNA duplicates and to obtain a read count from my next generation miRNA sequencing data. I have used cd-hit-dup command to remove the duplicates, but I have been getting the error cd-hit-dup: cdhit-dup.cxx:193: int HashingDepth(int, int): Assertion 'len >= min' failed
I will really appreciate if you would provide your valuable suggestion in this regard.
What kind of input are you using? fasta? Are you simply interested in removing duplicates? Since you have reads with lengths between 16 and 40 bp CD-HIT must be generating that error.
You can simply try
dedupe.sh
from BBMap suite if you want to dedupe the data or useclumpify.sh
to get counts and do other things: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.The file is fasta and I have already trimmed the sequence to 16-40 bp.