Does RSEM ignore STAR's splice-aware nature?
1
0
Entering edit mode
2.4 years ago
bioinfo2345 ▴ 40

I am currently doing an RNA-seq project with differential expression where I am using STAR as an aligner and RSEM for quantification. The project uses a reference genome with a GFF file containing information about the location of transcripts and introns.

From what I have read, RSEM cannot handle gapped alignments.

  1. How much of a problem is this?

  2. Does this mean that the benefit of using annotations about where introns are etc. are not used by RSEM? That is, does the extra things that STAR does (since it is splice-aware), not benefit the analysis in the end?

  3. Should I be using a different quantification tool than RSEM to make the most use of this annotation information? Are there any robust alternatives that also uses EM to handle multi-mapping reads?

  4. Or is it just as good to use, e. g. HTSeq-count?

RSEM STAR • 1.5k views
ADD COMMENT
1
Entering edit mode
2.4 years ago

RSEM uses reads aligned to the transcriptome, not to the genome. As far as I'm aware this is true of all alignment-based EM transcript quantification tools (its definitely true of salmon when quantifying pre-mapped reads). Since the transcriptome contains the sequences of transcripts after introns have been removed, then there should not be gapped reads in a transcriptome alignment (as long as the transcriptome is sufficiently correctly annotated).

 read sequence: ATGATGATGGGTGGTGGT

 alignment to g: MMMMMMMMMnnnnnnMMMMMMMMM - gapped alignment
 genome seq:     ATGATGATGCGACGAGGTGGTGGT
 transcript:     |>>>>>>>|------|>>>>>>>|
                          \    /
                           \  /
                            \/
                    |>>>>>>>||>>>>>>>|
transcript seq:     ATGATGATGGGTGGTGGT
alignment to trans: MMMMMMMMMMMMMMMMMM - ungapped alignment

Traditionally this is done by aligning using a non-splice aware aligner to a fasta file of transcript sequences. However, one of the many cool features of STAR is that it can do spliced alignment of reads to the genome (gapped), and then use the provided GFF to output the coordinates in transcript space (which should know be ungapped), ready for use by RSEM or Salmon etc.

ADD COMMENT
2
Entering edit mode

While it is true that many transcript quantification tools evaluate alignments directly with respect to the spliced transcriptome (since this makes more sense from the perspective of the molecules whose abundances are being estimated), it is worth noting that RSEM has another limitation, which is that it does not allow indels in the alignment. That is, regardless of splicing, if there is a region containing insertions or deletions in the reported alignments, RSEM cannot process these reads and they must be left out of the analysis. That limitation is not shared by many other tools such as Salmon, eXpress, TIGAR etc.

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6