How to find the total number of reads using CD-HIT
1
0
Entering edit mode
2.4 years ago
khq5801 ▴ 10

I have created clusters using CD-HIT for miRNA NGS data. The length of miRNA is 16-40 and I would like to find out the total number of reads and distinct reads corresponding to each miRNA. Kindly provide your valuable suggestion or any command that can help me. Thanks.

perl NGS miRNA CD-HIT • 813 views
ADD COMMENT
0
Entering edit mode
2.4 years ago
Mensur Dlakic ★ 28k

Presumably this is related to your earlier inquiry about cd-hit-dup. If so, at the start the program prints out the total number of sequences, and at the end the total number of clusters. For example, in your previous screenshot you had 200000 sequences and 199988 clusters, meaning you had 12 duplicates.

As to exact clusters, there will be a file ending in .clstr which will contain the clusters. Assuming this was your command:

cd-hit-cup -i sequences.fas -o sequences_nodup.fas

The clusters will be in sequences_nodup.fas.clstr. Even if you ran a plain cd-hit the same cluster file will be created.

ADD COMMENT
0
Entering edit mode

No, I was not able to use CD-HIT-DUP successfully. However, I followed this suggestion cd hit for removing sequence redundancy to generate non-redundant data. I have two files one is sequence and another belongs to the cluster (.clstr). Now, I would like to segregate the total number of sequences in the cluster to specific length miRNA. For instance, length 16 miRNA has total 5486 reads and 4586 distinct reads. Like this, I would like to generate the data till 40.

ADD REPLY
0
Entering edit mode

The less information you provide initially, the less useful suggestions you get.

ADD REPLY

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6