Kallisto and Salmon give different results
1
3
Entering edit mode
2.5 years ago
mailard ▴ 30

Hello!

I conducted an experiment to calculate the expression of transcripts with 3 programs: kallisto, salmon (alignment / matching modes) and express. And I have different results for kallisto and salmon. For example, different expression between one transcript and a different number of identified transcripts. Is this normal or maybe I have some errors?

Salmon index was create with manual recomendation. Code like that: For bam file

~/Soft/bcbio/anaconda/bin/hisat2 -p 5 -x /mnt/lapd/Index_hum/cdna/ensembl/release_103/hisat2_cdna/cdna_103 -U trim_g/"$i"_trimmed.fq.gz | samtools view -Sb - > bam/"$i".bam
~/Soft/bcbio/anaconda/bin/salmon quant --threads 2 -t /mnt/lapd/Index_hum/cdna/ensembl/release_103/Homo_sapiens.GRCh38.cdna.all.fa -l A -a bam/A549/"$i".bam -o exp/salmon_bam/A549/"$i"

For fastq file

~/Soft/bcbio/anaconda/bin/salmon quant --threads 2 --index /mnt/lapd/Index_hum/cdna/ensembl/release_103/salmon/ensembl_103/ --libType A -o exp/salmon_fasta/Hep3B/"$i" -1 trim_g/Hep3B/"$i"_1_val_1.fq.gz -2 trim_g/Hep3B/"$i"_2_val_2.fq.gz

Kallisto

~/Soft/bcbio/tools/bin/kallisto quant --index /mnt/lapd/Index_hum/cdna/ensembl/release_103/kallisto/tr_103 --output-dir exp/kallisto/Hep3B/"$i" trim_g/Hep3B/"$i"_1_val_1.fq.gz trim_g/Hep3B/"$i"_2_val_2.fq.gz
transcript expression salmon hisat2 kallisto • 2.5k views
ADD COMMENT
0
Entering edit mode

Hi! Since they are different tools, I will expect different results to some extent, so I am not sure how strict you are being with the results comparison. Apart from this, kallisto requires the strandness of the RNA-Seq library to be specified (Salmon calculates it internally if I remember correctly), maybe this could be a source of the discrepancy of the results?

ADD REPLY
0
Entering edit mode

Do you mean this one: --fr-stranded and --rf-stranded? I do cor test and correlation was less then 0.60

ADD REPLY
0
Entering edit mode

Yes, you can use --fr-stranded or --rf-stranded depending on your library strandedness. I've never really seen a dramatic difference in results produced by stranded vs. unstranded pseudoalignment though.

ADD REPLY
0
Entering edit mode

They should give very similar results (but not identical), especially for bulk RNA-seq. It would be helpful for you to post some plots of the counts produced by salmon vs. those produced by kallisto. The word "different" doesn't have much meaning (e.g. a correlation of 0.95 would still technically be "different").

So, can you show me what exactly you're observing that is "different"?

ADD REPLY
2
Entering edit mode
2.5 years ago

Back when they were first released, Kallisto and Salmon used much more similar algorithms, but those algorithms have diverged over time. In particular, Salmon now uses a selective alignment approach that is going to produce results closer to a full alignment than the results that Kallisto will produce.

The other big difference with modern Salmon is that it allows the construction of a reference with decoy sequences - that is, reads are not forced to map to a part of the transcriptome that they are similar to, when there is a part of the genome (that is not exonic sequence) that is an even better match. This is important, because somewhere between 30 and 50% of your reads probably come from non-exonic sequence.

Thus, I would expect these three methods to give different answers.

ADD COMMENT

Login before adding your answer.

Traffic: 1590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6