Hey everyone,
I want to produce my genome-guided transcriptome assembly using Trinity from a reptilian organism. Does anyone knows what's the best length number for the parameter --genome_guided_max_intron
?
Hey everyone,
I want to produce my genome-guided transcriptome assembly using Trinity from a reptilian organism. Does anyone knows what's the best length number for the parameter --genome_guided_max_intron
?
usually you take the maximum of the introns of all genes in a genome.
If the genes in a genome are not known you make an educated guess: eg. what is this measure in a (closely) related organism that is annotated. Moreover, this parameter is usually not the be taken too strictly, I mean that it does allow deviations from the max. Key is that order of magnitude, are the introns ~100nt in length or rahter ~1000nt in length .
What this parameter will do (or should do) is to limit merging transcript data from genes next to each other in the genome.
See it as sort of: if distance between two regions where RNAseq data maps is < max intron size their is a greater chance they will be from same gene, if above that threshold it will be more likely split and seen as two distinct gene regions.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4550110/pdf/icn046.pdf Most of the Reptillian introns are in the fraction of 101bp-2kbp in size compared to the usual mammalian intron size of 5-30kb...though there is a small fraction of reptillian introns with 5kb-30kb size...So im thinking that fixing the parameter at 30kb may reduce the accuracy(introducing many false positive discoveries)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
wild guess: the max intron size of the genes in that genome? :)
Can't find a solid answer .. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4550110/pdf/icn046.pdf From that paper i can assume that i should propably fix this parameter at about 30kb
30Kb will be more than enough I think yes.