Question

RSEM calculate-expression has only one sample in .isoforms.results file

0

Entering edit mode

21 months ago

Daniel ▴ 30

Hello,

I think I have a few misunderstandings about how to use RSEM, and have provided my script below.

I have three questions:

My script outputted a .isoform.results file, but with only one sample in it. As you can see from my script below, I call it on 4 samples. Was I supposed to run this script 4 times for each file?
I call STAR aligner, but only see an output bam file. Why is there no .bai file?
Is it correct to say that the line /home/name/project/RSEM/AGP tells RSEM the reference directory? If so, how did RSEM know where the index for star is? It is in the same directory, but it does not have the prefix "AGP", as the prepare_reference script I previously ran only gave that prefix to the rsem files.

Thanks!

rsem-calculate-expression --star \
-p 64 \
/home/name/project/RSEM/rnaseq_rawfastq/01-dmsoR01_S1.fastq,/home/name/project/RSEM/rnaseq_rawfastq/02_dmsoR02.fastq,/home/name/project/RSEM/rnaseq_rawfastq/03_dmsoR03.fastq,/home/name/project/RSEM/rnaseq_rawfastq/04-DOXR01_S4.fastq,/home/name/project/RSEM/rnaseq_rawfastq/05-DOXR02_S5.fastq,/home/name/project/RSEM/rnaseq_rawfastq/06-DOXR03_S6.fastq \
/home/name/project/RSEM/AGP \
dmsovDox

RSEM RNA-seq STAR • 1.3k views

ADD COMMENT • link 21 months ago by Daniel ▴ 30

score 2 · Accepted Answer · 2023-02-15

2

Entering edit mode

21 months ago

Ram 44k

RSEM cannot be run for multiple samples at the same time. The comma-separated FASTQs are for the case when you have multi-lane FASTQ files for a single sample. You need to run it once per FASTQ input set (SE/PE).
RSEM creates STAR genome BAM from its transcript BAM and doesn't really use the genome BAM, which is probably why you're not seeing the index.
Yes, the penultimate argument gives the reference location with the prefix, referred to as the reference_name. As long as you used rsem-prepare-reference with the same location+prefix, you don't need to worry about how calculate-expression finds the files it needs, as RSEM utilities are internally consistent. For more information, read the source code, it's a not-super-complicated perl script.

ADD COMMENT • link 21 months ago by Ram 44k

0

Entering edit mode

Thank you so much. I have pretty limited programming experience compared to other bioinformaticians, so I'm worried it might be complicated for me!

ADD REPLY • link 21 months ago by Daniel ▴ 30

1

Entering edit mode

This is how we learn - we have specific questions and we try and understand what a tool we're using is doing under the hood. AFAIK, both prepare-reference and calculate-expression need to use the same aligner, so it boils down to the underlying aligner's naming convention.

Comparing STAR and RSEM runs, STAR needs a --genomeDir but not a prefix and from your statement, only RSEM specific files use the prefix, so I can take en educated guess that the script breaks the prefix and its dirname into two components, uses the latter for STAR specific operations and the former (or both components since they do need the dir location) for RSEM specific operations.

This is easier for me because I've spent a lot of time studying RSEM and STAR but you can do it too, and the more you try the easier it will get.