Entering edit mode
8.4 years ago
nikelle.petrillo
▴
110
Hello,
I performed de novo assembly using Trinity. About 493,000 contigs were used to assemble the assembly. After creating a count matrix, we only found 38,000 genes...
Does this mean about 93% of our transcripts are not represented in the count matrix?
Thanks so much for the help, Nikelle
Your 493K transcripts are grouped into clusters by ids. 'gene id: TRINITY_DN1000|c115_g5' encoding 'isoform id: TRINITY_DN1000|c115_g5_i1' any other transcripts in the same cluster will have a different value for '_i#' at the end of the fasta header for that entry. What you have is 493K isoforms, representeing 38k genes.
Do you expect your genome to contain 493K genes?
Was an effort made to cluster the sequences with something like CD-HIT? Did you map you reads back to the assembly? Did that mapping look reasonable?
are you counting genes or transcripts?