Can BBTools Dedupe be used in transcriptome assembly clustering ?
1
0
Entering edit mode
5.5 years ago

Dear Team, Curious to know if dedup(https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/dedupe-guide/) can be used for clustering transcriptome assemblies. I have to cluster my assembly with n% identity. When used dedup as follows dedupe.sh in=assembly.fa out=Clustered_Assembly.fa -Xmx100g minidentity=n threads=40, we got the results lightning fast with compared to cd-hit-est. Please comment

RNA-Seq clustering Assembly BBTools • 1.1k views
ADD COMMENT
2
Entering edit mode
5.5 years ago
GenoMax 148k

While there is no specific reason why it can't be used (after all it is only looking at sequence) you should look through the before and after results to make sure they look reasonable.

You may also want to look at the concept of super transcripts (see: A: merge trinscripts id from results trinity ) if aim is to collapse isoforms into one representation.

ADD COMMENT
0
Entering edit mode

This certainly is helping me :)

ADD REPLY

Login before adding your answer.

Traffic: 1748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6