Question

Gene ID's for a trinity assembled de novo transcriptome

0

Entering edit mode

4.2 years ago

bry.th • 0

So the RNAseq data is for a non-model organism. The transcriptome was assembled using Trinity. However, Trinity has labelled the genes with it's own madeup title (in bold).

>TRINITY_DN41182_c0_g1_i1 len=209 path=[1:0-208] [-1, 1, -2]
ATGGTGAGAACTGCCCATGTGATGGAGACTCAGTATGGCCATCTGTTTGAAAAGGTCATA
GTCAACGACGACCTCTCGACCGCCTTCAGCGAGCTGCGGTTGGCACTAAAGAAAGTGGAG
ACGGAGACTCACTGGGTTCCAGTCAGCTGGACCCACTCCTGAGATCCTCACAGACTGTAA
AGGGAGAAAAGGGAAGGACTTTGACAAAA

>TRINITY_DN41181_c0_g1_i1 len=207 path=[1:0-206] [-1, 1, -2]
TATGGACCCCCTCCTCCTCCCCCTGGCGAGTACGGCGGCCATGCTGAGTCTCCGGTTGTC
ATGGTGTACGGATTGGACCCCGTCAAGATGAACGCAGACCGTGTCTTCAACATCTTCTGT
CTCTATGGCAACGTAGAGCGGGTCAAGTTCATGAAGAGTAAGCCCGGAGCAGCCATGGTG
GAAATGGGAGACTGTTACGCGGTGGAT

Which means when you map the reads to the assembled reference you get

target_id                 length    eff_length  est_counts        tpm
TRINITY_DN34124_c0_g1_i1     205        27.253           0          0
TRINITY_DN34120_c0_g1_i1     236       34.7816          15    14.2884

I need to use the sequence to look up gene ID's but I don't know how to do this. The closest genome I can find is with Ensembl DB for s.orbicularis, or A. percula but I don't know how to use these to convert the trinity output into something meaningful. I'm more comfortable using R, if possible but obviously beggars can't be choosers.

RNA-Seq R Trinity • 2.7k views

ADD COMMENT • link 4.2 years ago by bry.th • 0

1

Entering edit mode

You need to annotate the transcripts yourself using a program like maker (LINK) (eukaryotic genome) or prokka (LINK) (bacterial genome). Be sure to remove any redundancy before you annotate (using something like CD-HIT).

ADD REPLY • link 4.2 years ago by GenoMax 147k

0

Entering edit mode

Thanks! Trinotate sounds like what I would need. I'll check it out

ADD REPLY • link 4.2 years ago by bry.th • 0

score 2 · Answer 1 · 2020-09-10

2

Entering edit mode

4.2 years ago

Dave Carlson ★ 2.0k

Note that Maker is used for gene prediction from whole genome assemblies (often using transcriptomes in the process). If all you have is the transcriptome assembly, something like Trinotate (LINK) might be a more appropriate choice.

ADD COMMENT • link 4.2 years ago by Dave Carlson ★ 2.0k

0

Entering edit mode

I second Trinotate (+ TransDecoder), as it is tightly integrated with Trinity and does an overall good job. There are other transcriptome annotation pipelines around, like dammit, Annocript, Sma3s, but I never used any of them and I don't know how they perform.

ADD REPLY • link 4.2 years ago by h.mon 35k

0

Entering edit mode

I've used dammit before. It's very fast and easy to use. I think I like Trinotate better, though, mostly because it providers a somewhat wider set of annotation types and because of the integration with Trinity that you mentioned.

ADD REPLY • link 4.2 years ago by Dave Carlson ★ 2.0k