How to group transcripts by "gene" from different transcript assemblies?
2
0
Entering edit mode
3.3 years ago
O.rka ▴ 740

What is the recommended way to group transcripts by "gene"? For example, rnaSPAdes returns a predicted gene identifier with the transcript in the identifier.

Is there a way to merge similar transcripts from different assemblies?

I'm sure this uses the graph files in the backend so what I'm asking may be really out of scope. I could use something like CD-HIT but I'm wondering if there was a better way. Maybe a way to use the de brujn graphs together?

assembly trinity rnaspades transcript • 1.4k views
ADD COMMENT
0
Entering edit mode
3.3 years ago

Transcripts were clustered using CD-HIT (Cluster Database at High Identity with Tolerance) package. Used to remove the shorter redundant transcripts when they were 100% covered by other transcripts with more than 90% identity. The non-redundant clustered transcripts were then designated as unigenes.

ADD COMMENT
1
Entering edit mode

Is that a quote?

ADD REPLY
0
Entering edit mode
3.3 years ago
ponganta ▴ 590

If you want a common "baseline" for several assemblies, the only way I could think of would be annotation with a common database. For instance, if your assemblies come from closely related species, you could annotate CDSs with a common reference (e.g. a closely related model species).

For individual assemblies, if you would like to go from transcript-level to gene-level (which has advantages), you could also cluster transcripts using Corset, or Grouper based on shared read support.

Another way forward would be to combine both techniques. For each assembly, you could cluster assembled transcripts using one of the two previously mentioned programs. You could then construct SuperTranscripts using Lace. You could then try and annotate supertranscripts, and compare (likely) homologous genes with one another. Hope that helps.

ADD COMMENT
0
Entering edit mode

A suggestion. If you could provide links for the programs mentioned your answer would become more complete. Programs can have similar names and searching with the names above is likely to lead to not-useful-for-science results.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion, but I included links in my answer. They are but a click on the name away :) This is sadly hard to see on biostars when you highlight the names of programs, while also linking to a repository...

ADD REPLY
0
Entering edit mode

Agreed. Hard to see unless you hover on name. Another suggestion. You can either add a separate (LINK) after the name to make the link clear or not use code tags for program names but simply include links.

ADD REPLY

Login before adding your answer.

Traffic: 2541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6