So the RNAseq data is for a non-model organism. The transcriptome was assembled using Trinity. However, Trinity has labelled the genes with it's own madeup title (in bold).
>TRINITY_DN41182_c0_g1_i1 len=209 path=[1:0-208] [-1, 1, -2]
ATGGTGAGAACTGCCCATGTGATGGAGACTCAGTATGGCCATCTGTTTGAAAAGGTCATA
GTCAACGACGACCTCTCGACCGCCTTCAGCGAGCTGCGGTTGGCACTAAAGAAAGTGGAG
ACGGAGACTCACTGGGTTCCAGTCAGCTGGACCCACTCCTGAGATCCTCACAGACTGTAA
AGGGAGAAAAGGGAAGGACTTTGACAAAA
>TRINITY_DN41181_c0_g1_i1 len=207 path=[1:0-206] [-1, 1, -2]
TATGGACCCCCTCCTCCTCCCCCTGGCGAGTACGGCGGCCATGCTGAGTCTCCGGTTGTC
ATGGTGTACGGATTGGACCCCGTCAAGATGAACGCAGACCGTGTCTTCAACATCTTCTGT
CTCTATGGCAACGTAGAGCGGGTCAAGTTCATGAAGAGTAAGCCCGGAGCAGCCATGGTG
GAAATGGGAGACTGTTACGCGGTGGAT
Which means when you map the reads to the assembled reference you get
target_id length eff_length est_counts tpm
TRINITY_DN34124_c0_g1_i1 205 27.253 0 0
TRINITY_DN34120_c0_g1_i1 236 34.7816 15 14.2884
I need to use the sequence to look up gene ID's but I don't know how to do this. The closest genome I can find is with Ensembl DB for s.orbicularis, or A. percula but I don't know how to use these to convert the trinity output into something meaningful. I'm more comfortable using R, if possible but obviously beggars can't be choosers.
You need to annotate the transcripts yourself using a program like
maker
(LINK) (eukaryotic genome) orprokka
(LINK) (bacterial genome). Be sure to remove any redundancy before you annotate (using something like CD-HIT).Thanks! Trinotate sounds like what I would need. I'll check it out