Hi, I am working on marine invertebrates (non-model organism), and beginner of bioinformatics. I made transcriptome assembly by trinity. Then, I used EviGENE (tr2aacds4.pl pipeline), and okayset1st and okayset folder were created. Okayset1st include .tr file (57624 sequences) and Okayset include .mrna file (78174 sequences). To check the quality, I used BUSCO. The value (especially duplicate and missing ones) of .tr file was better than .mrna file. I expected mrna file is better because that is filtered by two stages (tr file is created by 1st stage of filtering). How come this BUSCO value?
I tried to do BUSCO with genes classified into "evgclass=main" in .mrna file, but the value is still bad.
Thank you!
I don't know enough about EviGENE to be certain of what you have done, so these are some opinions based on general knowledge of mRNA vs. transcript files.
mRNA file is likely to be more accurate because of extra filtering, which means more stringent. So I would expect more missing genes in that file, but fewer duplicates.
Transcript file will include some incomplete transcripts that maybe miss RBS and a start of translation, so they won't be in the mRNA file. Yet even incomplete transcripts should have enough sequence signal to increase the completeness by BUSCO, so no surprise there. I can't explain how a transcript file would be better in terms of duplicates - it should have more than the mRNA file.