Hi all,
This may be a stupid question, but I am curious to find out why exactly many papers chose a contig cutoff of 300 bp when performing a Trinity de novo transcriptome assembly.
In my head, it makes sense since short contigs probably do not provide a lot of information and are just "contaminating" the transcriptome. But this is only speculations since I have not found any reliable source with a good and precise explanation of why it is useful.
Do any of you have a good explanation? or even a paper/review you could recommend regarding the topic?
Then I would be grateful. This question has been buzzing in my head the last couple of days, and I am curious to find out why exactly we chose a 300 bp cutoff in a Trinity de novo transcriptome assembly (probably also in other assemblers as well, but I work with Trinity).
Cheers, Birgitte
Can you do a local similarity search for the shorter transcripts just to see what do you miss when you apply a cutoff of 300bp?