Hi!
I'm starting to use kallisto to do transcript-level expression quantification. I have some questions:
1) Does kallisto
infer the strandness of the input data just like salmon
does (--libType A
)? I guess the answer is no.
2) For other hand, kallisto
has the next to options:
--fr-stranded Strand specific reads, first read forward
--rf-stranded Strand specific reads, first read reverse
Are these options only working for PE data?
3) Regarding the fragment length estimation when using SE datasets:
-l, --fragment-length=DOUBLE Estimated average fragment length
-s, --sd=DOUBLE Estimated standard deviation of fragment length
(default: -l, -s values are estimated from paired
end data, but are required when using --single)
What does DOUBLE
mean? Do we have to specify the double of the number calculated?
Thank you in advance
Just to add to the answer, there is an option for SE data (--single).
Sorry, I have another question, "fragment-length" is not the same as read length, is it? I mean, it can't be inferred using input SE fastq files
Correct, fragment length refers to the length of the fragments loaded onto the sequencer. If this is your own dataset, then either you or whoever did the sequencing should know this (it can be estimated from a bioanalyzer plot). If this is a public dataset, then hopefully the value is written down somewhere.
Hello
Sorry, I am a little confused by you saying that --rf-stranded is most likely the most appropriate option. For SE data, wouldn't you want to only process reads that align to the forward strand of the transcript?
Or have I made an error here?
It doesn't matter whether you sequence SE or PE, read #1 in a pair aligns with the opposite orientation of the originating fragment for recent (since ~2013) data. In a parlance that many prefer, read #1 should align to the opposite strand of the transcript/gene.
by originating fragment, do you mean the transcriptome or genome sequences?
Either way. If you align to the transcriptome then read #2 should always be aligned as its reverse complement.
One more question.
RSEQC package outputting "1+-,1-+,2++,2--" , basically means that read#2 'set' the strand, since aligns in the same strand of the transcript/gene. Thus, read #1 aligns to the opposite strand of the transcript/gene (i.e. reverse-complemented).
For this library type (apparently the most common nowadays), parameter --rf-stranded should be the one to use in 'kallisto quant' for abundance estimation using a reference transcriptome. Is that right?
The link below has confused me in this respect, and just wanted to be sure:
https://github.com/griffithlab/rnaseq_tutorial/blob/master/manuscript/supplementary_tables/supplementary_table_5.md
Correct, TruSeq is the most common and it's
--rf-stranded
(if that's wrong, you'll be able to tell from the terrible quantitation metrics).