Number of contigs does not match number in count matrix --- de novo assembly
0
0
Entering edit mode
8.4 years ago

Hello,

I performed de novo assembly using Trinity. About 493,000 contigs were used to assemble the assembly. After creating a count matrix, we only found 38,000 genes...

Does this mean about 93% of our transcripts are not represented in the count matrix?

Thanks so much for the help, Nikelle

trinity rna-seq Assembly • 1.8k views
ADD COMMENT
1
Entering edit mode

Your 493K transcripts are grouped into clusters by ids. 'gene id: TRINITY_DN1000|c115_g5' encoding 'isoform id: TRINITY_DN1000|c115_g5_i1' any other transcripts in the same cluster will have a different value for '_i#' at the end of the fasta header for that entry. What you have is 493K isoforms, representeing 38k genes.

ADD REPLY
0
Entering edit mode

Do you expect your genome to contain 493K genes?

Was an effort made to cluster the sequences with something like CD-HIT? Did you map you reads back to the assembly? Did that mapping look reasonable?

ADD REPLY
0
Entering edit mode

are you counting genes or transcripts?

ADD REPLY

Login before adding your answer.

Traffic: 1936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6