Question

Kallisto and Salmon give different results

3

Entering edit mode

2.5 years ago

mailard ▴ 30

Hello!

I conducted an experiment to calculate the expression of transcripts with 3 programs: kallisto, salmon (alignment / matching modes) and express. And I have different results for kallisto and salmon. For example, different expression between one transcript and a different number of identified transcripts. Is this normal or maybe I have some errors?

Salmon index was create with manual recomendation. Code like that: For bam file

~/Soft/bcbio/anaconda/bin/hisat2 -p 5 -x /mnt/lapd/Index_hum/cdna/ensembl/release_103/hisat2_cdna/cdna_103 -U trim_g/"$i"_trimmed.fq.gz | samtools view -Sb - > bam/"$i".bam
~/Soft/bcbio/anaconda/bin/salmon quant --threads 2 -t /mnt/lapd/Index_hum/cdna/ensembl/release_103/Homo_sapiens.GRCh38.cdna.all.fa -l A -a bam/A549/"$i".bam -o exp/salmon_bam/A549/"$i"

For fastq file

~/Soft/bcbio/anaconda/bin/salmon quant --threads 2 --index /mnt/lapd/Index_hum/cdna/ensembl/release_103/salmon/ensembl_103/ --libType A -o exp/salmon_fasta/Hep3B/"$i" -1 trim_g/Hep3B/"$i"_1_val_1.fq.gz -2 trim_g/Hep3B/"$i"_2_val_2.fq.gz

Kallisto

~/Soft/bcbio/tools/bin/kallisto quant --index /mnt/lapd/Index_hum/cdna/ensembl/release_103/kallisto/tr_103 --output-dir exp/kallisto/Hep3B/"$i" trim_g/Hep3B/"$i"_1_val_1.fq.gz trim_g/Hep3B/"$i"_2_val_2.fq.gz

transcript expression salmon hisat2 kallisto • 2.5k views

ADD COMMENT • link updated 2.5 years ago by dsull ★ 6.9k • written 2.5 years ago by mailard ▴ 30

0

Entering edit mode

Hi! Since they are different tools, I will expect different results to some extent, so I am not sure how strict you are being with the results comparison. Apart from this, kallisto requires the strandness of the RNA-Seq library to be specified (Salmon calculates it internally if I remember correctly), maybe this could be a source of the discrepancy of the results?

ADD REPLY • link 2.5 years ago by iraun 6.2k

0

Entering edit mode

Do you mean this one: --fr-stranded and --rf-stranded? I do cor test and correlation was less then 0.60

ADD REPLY • link 2.5 years ago by mailard ▴ 30

0

Entering edit mode

Yes, you can use --fr-stranded or --rf-stranded depending on your library strandedness. I've never really seen a dramatic difference in results produced by stranded vs. unstranded pseudoalignment though.

ADD REPLY • link 2.5 years ago by dsull ★ 6.9k

0

Entering edit mode

They should give very similar results (but not identical), especially for bulk RNA-seq. It would be helpful for you to post some plots of the counts produced by salmon vs. those produced by kallisto. The word "different" doesn't have much meaning (e.g. a correlation of 0.95 would still technically be "different").

So, can you show me what exactly you're observing that is "different"?

ADD REPLY • link 2.5 years ago by dsull ★ 6.9k

score 2 · Answer 1 · 2022-05-18

Back when they were first released, Kallisto and Salmon used much more similar algorithms, but those algorithms have diverged over time. In particular, Salmon now uses a selective alignment approach that is going to produce results closer to a full alignment than the results that Kallisto will produce.

The other big difference with modern Salmon is that it allows the construction of a reference with decoy sequences - that is, reads are not forced to map to a part of the transcriptome that they are similar to, when there is a part of the genome (that is not exonic sequence) that is an even better match. This is important, because somewhere between 30 and 50% of your reads probably come from non-exonic sequence.

Thus, I would expect these three methods to give different answers.