Entering edit mode
8.3 years ago
Biogeek
▴
470
Hey guys,
Any tips? I know STAR aligner is optimised for mammalian alignments. I have a reference genome with a gff3 file for a plant and there are only details for exon, CDS, UTRs, but not introns. Additionally the genome is presented in scaffolds only, not chromosomes. The default STAR settings put max intron length to around 500,000 nt which is huge. Can anyone suggest a suitable maxintron value, or point me to literature on such a matter. It seems this goes unreported in alignment methodology most of the time.
Thanks.
If you map some reads with BBMap, you can produce a histogram of indel lengths with the "indelhist" flag, and use that to inform your decision. The distribution varies by plant species.
That will just map the first million reads and stop.
If you already have a mapped sam/bam file, you can alternatively generate the indel length histogram with Reformat:
Thanks Brian, seems like a handy little tool :-)
I'm assuming around 10,000 would be appropriate?
Hi, we have done RNA-seq analysis and to optimize parameters for plant genomes, minimum and maximum intron lengths were set as 60 and 6000 according to what was described for splicing in Arabidopsis (Márquez et al., 2010);