I am relatively new to RNA-seq analysis, and I have been using Kallisto and Sleuth for this.
My code looks like this - I run an LRT test first on the data, and then a Wald's test on those that have passed this filter. I end up with a list of loci that are differentially expressed between my two condition ('kd' versus 'control').
library("sleuth")
base_dir <- "/path/to/data"
samples <- paste0("sample",1:4)
kal_dirs <- sapply(samples, function(id) file.path(base_dir, id))
s2c <- read.table(file.path(base_dir, "kd.txt"), header = TRUE, stringsAsFactors=FALSE)
s2c <- dplyr::select(s2c, sample = sample, condition)
s2c <- dplyr::mutate(s2c, path = kal_dirs)
so <- sleuth_prep(s2c, ~condition)
so <- sleuth_fit(so)
so <- sleuth_fit(so, ~1, "reduced")
so <- sleuth_lrt(so, 'reduced', 'full')
so <- sleuth_wt(so, ‘conditionkd’)
results_wt <- sleuth_results(so, 'conditionkd')
results_lrt <- sleuth_results(so, 'reduced:full', test_type = 'lrt')
kd.lrt.sig_ids <- results_lrt$target_id[which(results_lrt$qval < 0.05)]
kd.wt.sig_ids <- results_wt$target_id[which(results_wt$qval < 0.05)]
kd_ids <- kd.wt.sig_ids[kd.wt.sig_ids %in% kd.lrt.sig_ids]
shared_results <- results_table_wt[results_table_wt$target_id %in% shared_ids,]
write.csv(shared_results, file=“/path/to/kallisto_wald_test_lrt_passed_kdd3.csv")
However, do I have to first normalise the TPM/count column on the abundance.tsv file produced Kallisto for each of the samples? Or is this incorporated in the tests? No online tutorials have done this, so how is it normalising libraries?
Hope this could be helpful: https://rawgit.com/pachterlab/sleuth/master/inst/doc/intro.html