I was wondering how do I know when to use STAR to map my reads to the genome and when it is better to use kallisto to align the reads. From reading about the two I think, I understand the difference between the two tools.
I have a data set, where we knockout a specific gene. After sequencing I expected to see a huge different between the WT and the KO, but this was visible only in the kallisto-quantified data. Using STAR-FeatureCounts found almost no reads mapped to this gene.
When running FeatureCounts with multiOverlap
parmeter
rowname Sample_1 Sample_2 Sample_3 Sample_13 Sample_14 Sample_15
STAR_1 2 2 2 6 0 1
Kallisto 1956 2164 2429 1 0 1
multiOl 6171 6603 4355 353 548 469
When looking at the reds in the bam files, there is a clear difference between the expression of my gene of interest between the two samples ( in the image below I show the difference between samples 1 and 13). The gtf file show there are two genes in this region and from I can see, the left part of the reads can be mapped to the first gene (red squae), but on the right-hand side these reads are clearly mapped to my gene of interest (in green square below).
The point of my question here is to understand why STAR doesn't find the same behavior as kallisto and maybe even more important is it possible to set STAR in such a way to behave the same as kallisto?
I appreciate your help
thanks dsull for the comprehensive responses both in the answer as well as in the comments. I appreciate the time.
A comment to your note #2 - I am not sure what the parameter is you've mentioned here, as I don't see it neither in the
kb ref
nor in thekallisto index
help and I'm using the newest versions (0.28 and 0.50.1 respectively). In my case I anyway downloaded the provided mouse_index_standard file from the github repository. Also there, I can't find any mentioning of this destinction between the two. Can you please tell me which one you mean?After reading the answer from ATpoint I have tested STAR-> Salmon and got similar results to those of the kallisto run. I don't think the STAR alignment is wrong or that STAR is doing a bad job here. I always liked using STAR and have a set-up pipeline for that. This is the main reason I used it here as well. I agree the issue here is not the alignment but the quantification of the aligned reads, which the salmon results only enhance. I am just surprised, that the difference this time is so extreme.
Have a good night
Of course! Oh, the parameter is --d-list. By default, in
kb ref
, --d-list is set to whatever FASTA you supply tokb ref
but you can disable it by setting --d-list=None. (The option also exists inkallisto index
although --d-list defaults to None inkallisto index
). If you disable it, you won't process those "distinguishing flanking k-mers".And yup, honestly, not too surprised at how extreme the result is. It's really a simple explanation: Multimappers are given a count of 0 in one situation but a count >0 in another. kallisto, salmon, STAR->RSEM, and STAR->salmon will all reliably fix this.