Hello!
I conducted an experiment to calculate the expression of transcripts with 3 programs: kallisto, salmon (alignment / matching modes) and express. And I have different results for kallisto and salmon. For example, different expression between one transcript and a different number of identified transcripts. Is this normal or maybe I have some errors?
Salmon index was create with manual recomendation. Code like that: For bam file
~/Soft/bcbio/anaconda/bin/hisat2 -p 5 -x /mnt/lapd/Index_hum/cdna/ensembl/release_103/hisat2_cdna/cdna_103 -U trim_g/"$i"_trimmed.fq.gz | samtools view -Sb - > bam/"$i".bam
~/Soft/bcbio/anaconda/bin/salmon quant --threads 2 -t /mnt/lapd/Index_hum/cdna/ensembl/release_103/Homo_sapiens.GRCh38.cdna.all.fa -l A -a bam/A549/"$i".bam -o exp/salmon_bam/A549/"$i"
For fastq file
~/Soft/bcbio/anaconda/bin/salmon quant --threads 2 --index /mnt/lapd/Index_hum/cdna/ensembl/release_103/salmon/ensembl_103/ --libType A -o exp/salmon_fasta/Hep3B/"$i" -1 trim_g/Hep3B/"$i"_1_val_1.fq.gz -2 trim_g/Hep3B/"$i"_2_val_2.fq.gz
Kallisto
~/Soft/bcbio/tools/bin/kallisto quant --index /mnt/lapd/Index_hum/cdna/ensembl/release_103/kallisto/tr_103 --output-dir exp/kallisto/Hep3B/"$i" trim_g/Hep3B/"$i"_1_val_1.fq.gz trim_g/Hep3B/"$i"_2_val_2.fq.gz
Hi! Since they are different tools, I will expect different results to some extent, so I am not sure how strict you are being with the results comparison. Apart from this, kallisto requires the strandness of the RNA-Seq library to be specified (Salmon calculates it internally if I remember correctly), maybe this could be a source of the discrepancy of the results?
Do you mean this one: --fr-stranded and --rf-stranded? I do cor test and correlation was less then 0.60
Yes, you can use --fr-stranded or --rf-stranded depending on your library strandedness. I've never really seen a dramatic difference in results produced by stranded vs. unstranded pseudoalignment though.
They should give very similar results (but not identical), especially for bulk RNA-seq. It would be helpful for you to post some plots of the counts produced by salmon vs. those produced by kallisto. The word "different" doesn't have much meaning (e.g. a correlation of 0.95 would still technically be "different").
So, can you show me what exactly you're observing that is "different"?