Question

Good practices for RNA-seq (2x50, 2x75, 2x100) aligment with STAR + RSEM

0

Entering edit mode

2.0 years ago

Rosario Distefano • 0

Hi All,

I am new to RNA-seq and am looking for good practices for aligning RNA-seq data with different read lengths (2x50, 2x75, 2x100). The idea is to use STAR 2-pass (2.7.10b) as an aligner and RSEM for gene and transcripts quantifications, eventually using tximport for read count extraction. I read many forums, groups, the ENCODE and NCI GDC pipelines, and the STAR reference manual. However, the number of parameters that can be tuned is overwhelming. I know I should set the --sjdboverhang to length - 1 for genome generation (which can be omitted in the two STAR passes), but rather than this parameter, is there something else to which I should pay attention?

I appreciate any help you can provide.

RSEM RNA-seq STAR • 1.1k views

ADD COMMENT • link updated 23 months ago by swbarnes2 14k • written 2.0 years ago by Rosario Distefano • 0

score 0 · Answer 1 · 2022-12-09

Well, instead which parameter is better, you might want to think about what you are going to compare your data with (if you plan to do so). For example, if you are planing to compare to GTEx data in the future, you might use GTEx parameters as well as gene model, STAR version, etc; if you are planning for GDC, then the exact GDC setup.

score 0 · Answer 2 · 2022-12-09

0

Entering edit mode

23 months ago

swbarnes2 14k

I think defaults for STAR will mostly be fine. The issue is that longer reads will map slightly better, so you'll have a bit of a bias in mapping. You might want to consider trimming all the reads to 50.

I'm not sure that 2-pass will do much for you in a well-annotated genome

ADD COMMENT • link 23 months ago by swbarnes2 14k