Good practices for RNA-seq (2x50, 2x75, 2x100) aligment with STAR + RSEM
2
0
Entering edit mode
2.0 years ago

Hi All,

I am new to RNA-seq and am looking for good practices for aligning RNA-seq data with different read lengths (2x50, 2x75, 2x100). The idea is to use STAR 2-pass (2.7.10b) as an aligner and RSEM for gene and transcripts quantifications, eventually using tximport for read count extraction. I read many forums, groups, the ENCODE and NCI GDC pipelines, and the STAR reference manual. However, the number of parameters that can be tuned is overwhelming. I know I should set the --sjdboverhang to length - 1 for genome generation (which can be omitted in the two STAR passes), but rather than this parameter, is there something else to which I should pay attention?

I appreciate any help you can provide.

RSEM RNA-seq STAR • 1.1k views
ADD COMMENT
0
Entering edit mode
23 months ago
Zhenyu Zhang ★ 1.2k

Well, instead which parameter is better, you might want to think about what you are going to compare your data with (if you plan to do so). For example, if you are planing to compare to GTEx data in the future, you might use GTEx parameters as well as gene model, STAR version, etc; if you are planning for GDC, then the exact GDC setup.

ADD COMMENT
0
Entering edit mode
23 months ago

I think defaults for STAR will mostly be fine. The issue is that longer reads will map slightly better, so you'll have a bit of a bias in mapping. You might want to consider trimming all the reads to 50.

I'm not sure that 2-pass will do much for you in a well-annotated genome

ADD COMMENT

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6