I have a question about some of the Trinity QC information. I'll use a tutorial dataset (found here:https://github.com/trinityrnaseq/KrumlovTrinityWorkshopJan2018/wiki/Home/1a23eb56a8857c3ed9595f9224367e25129f8f4b) for an example to help keep the question somewhat straightforward.
When TrinityStats.pl is run on the tutorial dataset, the result is 683 'genes' and 687 transcripts. Then, in the tutorial, under "Assess number of full-length coding transcripts," following BLAST-ing of transcripts and running analyze_blastPlus_topHit_coverage.pl on them, there is a chart generated of bins of percent length coverage of the best matching protein sequence, counts of proteins found in each bin, and a running total of proteins in all bins. It seems there's only 324 proteins in total. What happened to the rest/why is there a discrepancy between the number of proteins that have BLAST hits and the number of genes in the assembly?