Hi,
I want do run STAR in OmicsBox and I have some questions about its parameters. Typically, there's not a lot of references for parameters and not every one of them should be set as default, so I'm really struggling with them. I'd really appreciate it if someone could help me.
1) 'Maximum distance between mates'
This info is from https://wikibits.ugent.be/index.php/Parameters_of_STAR
- = maximum distance between reads from a pair when mapped to the genome. If reads map to the genome farther apart the fragment is considered to be chimeric. The default value of 500000 is fine-tuned to mammalian genomes, for plant and yeast genomes you will have to decrease it.
- STAR maps the reads to the genome, this is why the max distance between reads of a pair is equal to the intron size. For organisms with small introns you should take intron size + max fragment length
Is this info correct? My research involves Physcomitrella patens, so I'll have to decrease the input value for this parameter, but I don't know by how much. Where does that default value (500000) come from? Can I use the suggestion mentioned above (intron size + max fragment length)?
2) ‘Include Chimeric Alignments’ checkbox
This info is from OmicsBox Manual http://manual.omicsbox.biobam.com/user-manual/module-transcriptomics/rna-seq-alignment/#RNA-SeqAlignment-RunRNA-SeqAlignment(STAR)
- This option allows to include the chimeric alignments together with normal alignments in the main BAM file. The format of chimeric alignments follows the latest SAM/BAM specifications.
Is there a reason why one should or should not separate these two kinds of alignment?
3) 'Maximum Number of Mismatches'
This info is from https://wikibits.ugent.be/index.php/Parameters_of_STAR
- = maximum number of mismatches for a read (single-end) or a pair of reads (paired-end). Default is 10. The value you should choose is dependent on the read length. For short quality trimmed reads you typically allow 5% mismatches.
The default value in STAR in OmicsBox is 999, which is confusing to me. My reads are 150bp, which is not short, right? I'm not sure what to do with this parameter. Should I leave it as default (10)?
Thank you.
You should try a run leaving all parameters at default. Only thing I would change is the "max distance between mates". If you know what the average length of introns is in your organism then you can use that number instead of 500K which is appropriate for human/mammalian genomes.