Question

Forum:Kallisto New RNA-seq quantification method discussion

10

Entering edit mode

9.2 years ago

morovatunc ▴ 560

Dear all Hi,

I would like to get your opinion about this interning tool which might effect RNA-seq era. Its biggest advantage seems to be its tremendous speed. Also, it performs quite well with its competitors. Has any of you got a chance to use it ? If so could you share some feedback with us ? Because article seems very interesting. If it is okay with the website policy I would like to get it going as a discussion.

Thanks,
Tunc.

Problem: I cannot get in to its nature page because it redirects me in to the home page of nature so I shared a news about this problem.

https://liorpachter.wordpress.com/2015/05/10/near-optimal-rna-seq-quantification-with-kallisto/

RNA-Seq • 12k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 9.2 years ago by morovatunc ▴ 560

2

Entering edit mode

There is salmon (and sailfish), if you are interested in this class of tools.

ADD REPLY • link 9.2 years ago by GenoMax 151k

5

Entering edit mode

For those who don't know, sailfish has recently been heavily updated and a lot of the cooler aspects of Salmon integrated into it. If someone reads the Kallisto paper they should note that the sailfish comparisons are largely meaningless for determining results with a current version (details here).

ADD REPLY • link 9.2 years ago by Devon Ryan 105k

1

Entering edit mode

It is true that Sailfish was updated after the kallisto paper was submitted. The current version (0.9.2) incorporates many of the key elements of kallisto (pseudoalignment, the kallisto bias correction and the kallisto effective length correction) so that it is now practically identical to kallisto in the underlying algorithm (and therefore in the results produced).

The comparisons to Sailfish in the kallisto are meaningful insofar as they show definitively that the Sailfish algorithm based on k-mer matching (published in Nature Biotechnology) is inferior to read pseudoalignment that underlies kallisto (and now Sailfish).

ADD REPLY • link 9.2 years ago by Lior Pachter ▴ 720

score 4 · Answer 1 · 2016-04-12

4

Entering edit mode

9.2 years ago

Devon Ryan 105k

We've had a couple projects that have given both Kallisto and Salmon a try. We've generally gone with Salmon, which is not to say that Kallisto is bad. Since tximport will be in the next R release, I expect we'll switch the majority of our "standard" RNAseq analyses to either Salmon or Kallisto in the next year or so.

ADD COMMENT • link 9.2 years ago by Devon Ryan 105k

1

Entering edit mode

I would like to add that there are 2 papers already that have been out for some time this year based on the quantification and the other one based on quantification and differential expression and can also be used for benchmarking, both comes with a website as well to use an add up new methods to benchmark. Take a look and might be quite useful in selecting the quantification pipelines and downstream DE tool for inferring differences in transcriptional programs.

rnaseqcomp

RNAontheBENCH

ADD REPLY • link 9.0 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Any specific reason that you tend to favor Salmon? I'll be starting RNA-Seq data soon so i'm curious.

ADD REPLY • link 9.2 years ago by Sinji ★ 3.2k

4

Entering edit mode

It happened to give more reliable results on a dataset of interest that we tested it on. There's the added benefit that Rob Patro is super responsive (and active on this site), though that wasn't the deciding factor.

ADD REPLY • link 9.2 years ago by Devon Ryan 105k

0

Entering edit mode

And maybe the ram requirement of the STAR? It needs high amount of RAM.

ADD REPLY • link 9.2 years ago by morovatunc ▴ 560

1

Entering edit mode

You don't need STAR to use Salmon. You can certainly give Salmon a BAM file, but you can also just give it fastq files (as is the case with Kallisto).

ADD REPLY • link 9.2 years ago by Devon Ryan 105k

0

Entering edit mode

There is an active user group for kallisto-sleuth where questions are quickly answered https://groups.google.com/forum/#!forum/kallisto-sleuth-users

ADD REPLY • link 9.2 years ago by Lior Pachter ▴ 720

0

Entering edit mode

I still need time to come to the point where I can start using kallisto. Their indexing method and especially k-comptatibailty is hard to comprehend. Thank you for your answer !

ADD REPLY • link 9.2 years ago by morovatunc ▴ 560

0

Entering edit mode

While the details of how pseudoalignment is performed are slightly technical, what it means is explained in this blog post: https://liorpachter.wordpress.com/tag/pseudoalignment/

ADD REPLY • link 9.2 years ago by Lior Pachter ▴ 720

0

Entering edit mode

For differential analysis it is strongly recommended to use Sleuth; see this thread Can Kallisto be followed by DESeq, EdgeR or Cuffdiff?

ADD REPLY • link 9.2 years ago by Lior Pachter ▴ 720

score 4 · Answer 2 · 2016-04-12

4

Entering edit mode

9.2 years ago

lkmklsmn ▴ 980

I find the pseudo alignment approach (kallisto, salmon, sailfish) very innovative. However, I would like to point out that RNA-seq data carries a lot more information than just gene expression levels. In my opinion the gene-level output of RNA-seq data is an alignment and not just an expression estimate. RNA-seq alignments carry information on allele specific expression, alternative splicing (junction reads) and give you the opportunity to visualize the raw data. Since you never know if you may want to look at some of these aspects at a later point in time, I have been hesitant to use these pseudo aligners in my "standard" workflow.

ADD COMMENT • link 9.2 years ago by lkmklsmn ▴ 980

0

Entering edit mode

Just to clarify kallisto, salmon etc work at transcript level and not gene level!

While It is true that this may be a limitation in some situations more than others, it is not nearly as worrisome as if it were indeed gene level pseudo alignment.

ADD REPLY • link 9.2 years ago by Istvan Albert 102k

score 3 · Answer 3 · 2017-06-06

There is a 2017 Bmc Bioinformatics paper evaluating 219 combinatorial implementations of the most commonly used analysis tools for their impact on differential gene expression analysis by RNA-Seq, including Kallisto and to me it looks very good:

"Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq"

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1457-z

They also share their scripts which is very nice, for instance a Perl script for aligning and modeling with Kallisto with the settings they used in the paper is below:

https://github.com/cckim47/kimlab/blob/master/rnaseq/alignAndModel/alignAndModel_KaKa.pl

score 2 · Answer 4 · 2016-04-12

2

Entering edit mode

9.2 years ago

WouterDeCoster 48k

I have used it. Very easy to work with, very quick. Works nicely together with sleuth. Reason I haven't dived further into it is because it automatically performs transcript length based normalization, which is not applicable nor desired for my type of data (QuantSeq 3', Lexogen).

ADD COMMENT • link 9.2 years ago by WouterDeCoster 48k

0

Entering edit mode

Hi! thanks for the comment. I am currently working with 3' RNA SE seq data and I cannot decide on alignment method.. could you please recommend something? Its a human cancer, and I do STAR alignment for sure, but I was wondering if Kallisto could work as well; but it requires fragment length and sd inforlation for single read mode which i dont know. I know that Salmon doesnt need it, I plan to try it. I also have some doubts about trimming adapters and removing poly A, which leads to A bias in the data.. could you please tell if you do anything with it? I read once that its better not do to any. Its also is QUantSeq 3', Lexogen. Sorry for the bunch of questions its my first time working with rna data and I just want to make sure if my analysis is relevant. Many thanks!

ADD REPLY • link 5.6 years ago by dhlsl • 0

score 1 · Answer 5 · 2016-04-13

I've used kallisto to compute RNA-Seq expression values (both normalized counts and TPM values). I was particularly interested since I am working on multiple closely related species (tomatoes) where alignment is not perfect since mapping rates fall depending on the genetic distance.

For my work I use one unique reference which cause problems since I'm working on multiple species more or less closely related to this reference. I'm also based on the Proton Ion platform which might cause differences with Illumina users (especially due to homopolymers/insertions/deletions that are frequent in Ion Proton reads).

With Kallisto + reference transcriptome, the fraction of reads mapped ranged from 54 to 76%. which is pretty good in my opinion. With STAR + reference genome (not allowing multimapping reads, 2 mismatches allowed), mapping rates ranged from 34 to 70% due to too many mismatches from both technical origin (Proton Ion) and genetic distance between my species. Since I've mapped to the genome and not to the transcriptome and also due to the less mainstream Proton ion reads, I guess it is hard to compare.....I'm currently working on other methods (TMAP aligner) to compare results.

So far, looking at gene expression from specific enzymes, most of them behave as expected (enzymes linked to metabolite production are expressed accordingly across genotypes).

I'm still in the process of comparing mainstream aligners to Kallisto pseudoalignment.

Looking forward for additional insights from this forum