Question

RNAseq differential expression analysis results: Kallisto vs STAR

0

Entering edit mode

5.4 years ago

marcos.georgiades.18 ▴ 10

Hi all,

I am facing a minor dilemma, and I thought perhaps you could give me some advice. I have conducted differential expression analysis of a Control vs Mutant RNAseq dataset. I conducted the analysis twice, using different pipelines: PipelineA: Kallisto --> DESeq2 PipelineB: STAR --> featureCounts --> DESeq2

I wanted to get a sense of how different the results would be when "classifying against a transcriptome" and when "quantifying against a genome". PipelineA outputs ~2000 differentially expressed genes. PipelineB outputs ~1600, of which ~1400 are also identified as differentially expressed by PipelineA. Filtering conditions for significance (e.g. FDR < 0.05) were kept the same for both.

My question is, which results should I trust? I read the transcriptome path is usually more accurate, but perhaps it doesn't hurt to be a bit conservative?

Many thanks:), Marcos

RNA-Seq rna-seq kallisto STAR • 3.5k views

ADD COMMENT • link updated 5.3 years ago by Istvan Albert 102k • written 5.4 years ago by marcos.georgiades.18 ▴ 10

1

Entering edit mode

I suggest you read the papers of the pseudoalignment tools such as kallisto and salmon plus the recent papers that benchmark these different pipelines. Currently the field seems to prefer the pseudoalignment methods. Details in the papers.

ADD REPLY • link 5.4 years ago by ATpoint 88k

1

Entering edit mode

PipelineA probably gives you more total detected genes and more "counts" per gene so that may explain higher number of differentially expressed genes.

Regardless, your overlap is very high.

ADD REPLY • link 5.4 years ago by igor 13k

1

Entering edit mode

As an additional comment: I recommend checking out sleuth for performing differential gene expression analysis with kallisto.

ADD REPLY • link 5.3 years ago by dsull ★ 7.6k

score 5 · Answer 1 · 2020-03-20

The two methods are complementary - thus your can't quite think about it as one is more "trustworthy" than the other. You are measuring different things.

Each one separately, or both could be right and wrong. All at the same time.

There are various tradeoffs in each, as igor puts it, the overlap is already high.

I always recommend that people do both, pseudo alignments with Salmon/Kallisto then look at the genomic alignments for each transcript that turns out to have relevance. The alignments at genome level are more informative and can help you decide how much to trust the quantification.

The problem with both Salmon and Kallisto is that the read reassignment is somewhat of a black box, it is not easy to track why a multi mapping read is assigned where it is and how strong the evidence and how reliable the process was, how big the errors etc.