Trinity assembly way too many transcripts!
1
0
Entering edit mode
5.2 years ago
karsa692 • 0

I am hoping someone might be able to help me trouble shoot some issues I am having building an assembly which I would like to use as a reference for DE analysis. As the title states, I am using trinity and I am getting way too many transcripts. Before I go any further I will state that I have read the article "There are too many transcripts, what do I do?" but I think my problem is a bit beyond the scope of that post.

I am trying to build a de novo transcriptome assembly using pooled samples of sea urchin larvae (four samples representing one sample per four treatment levels). I am getting around 800,000 transcripts where I would normally expect to get around 70,000 to 100,000 for similar data sets. I know that at least part of the issue is that I have lots of duplicate genes/isoforms due to the nature of the samples (many pooled individuals), however this still seems extreme. I have tried collapsing my assemblies using Grouper and CD-HIT but the assemblies are so large that collapsing still produces very large assemblies. Does anyone have any suggestions?

Thanks.

RNA-Seq Assembly • 1.7k views
ADD COMMENT
0
Entering edit mode

It may be of interest to tell us some information about how much data went into this analysis, size of the genome and how you pre-processed the data.

Out of curiosity is there a specific reason to do a de novo assembly? Sea urchin genome has been available for over a decade and has a defined transcriptome (for S. purpuratus, available from Echinobase, NCBI or Ensembl). You could use this known transcriptome to weed out spurious transcripts from your assembly.

ADD REPLY
3
Entering edit mode
5.2 years ago

Hi,

I had the same problem as you. My solution was to use the DRAP tool, which uses trinity but makes filters afterwards. The results are very good and I get a number of transcripts similar to the expected number.

First, you need to assemble the transcriptome of each of your conditions with the runDrap command. Then, DRAP proposes a tool to merge the transcriptomes of each condition into a single complete transcriptome, with the runMeta command.

If you're interested, here is the link to the tool: http://www.sigenae.org/drap/

Best, Amandine

ADD COMMENT
0
Entering edit mode

Thank you for your suggestion, I will give this a try.

ADD REPLY

Login before adding your answer.

Traffic: 1641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6