Question

Interpreting Trinity components

2

Entering edit mode

11.1 years ago

Damian Kao 16k

I am trying to interpret the component graphs from my Trinity run. I rendered a couple of graphs of the components (using c*.graph.out files) from a Trinity assembly and noticed that some components had a structure where the root node (-1 node) is in the middle of a "linear" sequence of nodes.

I uploaded my ipython notebook here with 3 of the graphs I rendered: http://nbviewer.ipython.org/github/damiankao/trinity-visualization/blob/master/trinity_vis.ipynb

The first component (c445) looks normal to me, with a root node (in red) that connects to one linear sequence and eventually splitting into two branches followed by a merge that could possibly indicate isoforms.

But the second and third component graphs showed the root node in the middle of a "linear" region. Furthermore, for the second component graph, the probable paths were the two "arms" of the root node.

There are no shared k-1-mers between the nodes on either side of the root node in the second and third component. How does the bundling of contigs work in this case? Is it putting these contigs together based on pair-end reads? And what exactly is the -1 root node?

Transcriptome Trinity • 3.1k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.1 years ago by Damian Kao 16k

Ram · Accepted Answer · 2014-06-25

I just got a response from the Trinity mailing list via Brian Hass:

The -1 is the root node for the de Bruijn graph. A way you can end up with multiple 'arms' in the graph like that is if there are multiple inchworm contigs that are clustered together based on paired-read links (from the bowtie alignment step). This way, they end up having the same 'component' number in the accession string (ie. (c\d+) of the c\d+_g\d+_i\d+ accession naming format. This often happens when transcripts for a given gene are fragmented.