Question

Change default values for maximum amplicon size in STAR?

0

Entering edit mode

11 months ago

andrebolerbarros • 0

Hi everyone,

Sorry if this question is rather philosophical but, I find it important.

As one can read in the STAR aligner original paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/):

Similarly to other RNA-seq aligners, STAR’s default parameters are optimized for mammalian genomes. Other species may require significant modifications of some alignment parameters; in particular, the maximum and minimum intron sizes have to be reduced for organisms with smaller introns.

I was dealing with a pipeline I got from a collaborator that established --alignIntronMax 500000 (500k). Now, as I understand, this value should be defined according with the species we are working with (in this case Homo sapiens). Checking the paper https://pubmed.ncbi.nlm.nih.gov/26581719/ (table 2), the maximum intron length reported is 1160411; using my gff file, R and GenomicFeatures package (https://support.bioconductor.org/p/103386/), I go the value 1240120.

Taking all of this into consideration, my main question is: should we adapt this default parameters in STAR taking into consideration the most recent values (~1M), even though the default options were optimized for mammalian genomes?

(Addition: I made an histogram with intron length frequency using R.)

enter image description here

Thanks in advance

STAR Transcriptomics RNA-Seq • 413 views

ADD COMMENT • link updated 11 months ago by ATpoint 86k • written 11 months ago by andrebolerbarros • 0

1

Entering edit mode

My personal view is to never change settings in standard tools that well stood the thest of time unless there is a good data-driven reason (i.e. you experience a clear problem). This histogram could be based on a single width of 1,200,000 while 99.999% of other widths are below the default threshold. Just leave it. The outlier could even be one of these obscure genes with fishy annotation that nobody knows or cares about. I would just do my analysis and move on. These strange corner cases are everywhere, at every analysis step, so best to ignore unless there is a good reason not to.

ADD REPLY • link 11 months ago by ATpoint 86k