I have created clusters using CD-HIT for miRNA NGS data. The length of miRNA is 16-40 and I would like to find out the total number of reads and distinct reads corresponding to each miRNA. Kindly provide your valuable suggestion or any command that can help me. Thanks.
No, I was not able to use CD-HIT-DUP successfully. However, I followed this suggestion cd hit for removing sequence redundancy to generate non-redundant data. I have two files one is sequence and another belongs to the cluster (.clstr). Now, I would like to segregate the total number of sequences in the cluster to specific length miRNA. For instance, length 16 miRNA has total 5486 reads and 4586 distinct reads. Like this, I would like to generate the data till 40.
The less information you provide initially, the less useful suggestions you get.