Could you explain the difference between STAR
, KALLISTO
, SALMON
etc. to experimental Biologist/non-bioinformatician.
If possible, the pros and cons of each pipeline.
Edit below
I ask this because three of my colleagues use this 3 difference tools for RNASeq. Basically to answer the same type of biological questions.
@Devon has about the best answer for that here: A: Alignment and mapping
STAR
is an aligner.Kallisto/salmon
are mapping (technically pseudo-aligners as @lieven points out) programs.For getting a biological answer either of pipelines should be fine. Mappers will save time compared to aligners .
Well now I get too confused :( .
Mapping is part of the alignment. So why they created new mapping tools like
Kallisto/salmon
if "For getting a biological answer either of pipelines should be fine" whenSTAR
is already available. Why use mapping, if it is included in alignment?Kallisto
andsalmon
are not really mappers in the strict sense of the word (== pseudoaligners) . Whilestar
does mapping in the old-school sense of the word (== start with a seed, find exact match and extent) the others work more like 'BAC-fingerprinting' , they create some sort of approximation of the reference and reads (eg. something like a kmer profile) and then they match the profiles rather than the actual sequences.Speed and accuracy are the major drivers for
kallisto/salmon
I guess , they are for instance much better suited for isoform quantificationThank you very much genomax & Lieven, This is very helpful to understand. Thank you for your time.
Dear All,
Thank you for your feedback. Based on your answers I decided to use "
Kallisto
" and have done my analysis.How I have a confusing outcome. Out of my 5
T-DNA knockout plants
, two of them areshowing the expressing of the gene
(even higher than the WT).These are published lines and I have already genotyped them by PCR before.
Is this because of the
pseudoaligning
?How can I check this? Is there any way of mapping?
P.S. I do not believe there is cross-contamination!
Edit: This comment is linked to this question; How to map RNASeq data to reference genome to check T-DNA insertions?
When things like that happen it's good to get a BAM file. My guess is that the KO is deleting a single exon and that the recycling of the resulting non-sense transcript is simply lower in some of your samples. Alternatively, if there are paralogs then perhaps they really are messing up the pseudoalignment.
A good start would be to do a traditional alignment and creation of a browser track to visually examine the reads on a browser such as IGV. deeptools
bamCoverage
can conveniently create these tracks. Often by eye you can capture what is happening more intuitively than by all these (pseudo)alignment metrics.Thank you Devon Ryan & ATpoint. Does this mean I can not rely on Kallisto o/p for my downstream data analysis? Because I can not compare the mutant gene expression with this o/p...... Btw I posted same question How to map RNASeq data to reference genome to check T-DNA insertions? Sory about that.
Pseudoaligners are generally quite reliable, you may simply have one of the cases where they're not. That remains to be seen.
But how could I interpret my data if my mutant shows the highest expression than WT by pseudoaligning? Any method?
How was the KO done? Deleting one exon, the entire gene? That is important to know for interpretation.
Are you able to make the data public? It'd be much easier to figure out what's going on then.
Note that knocked out genes are often still highly expressed, they're just not translated.