Hi all,
I am facing a minor dilemma, and I thought perhaps you could give me some advice. I have conducted differential expression analysis of a Control vs Mutant RNAseq dataset. I conducted the analysis twice, using different pipelines: PipelineA: Kallisto --> DESeq2 PipelineB: STAR --> featureCounts --> DESeq2
I wanted to get a sense of how different the results would be when "classifying against a transcriptome" and when "quantifying against a genome". PipelineA outputs ~2000 differentially expressed genes. PipelineB outputs ~1600, of which ~1400 are also identified as differentially expressed by PipelineA. Filtering conditions for significance (e.g. FDR < 0.05) were kept the same for both.
My question is, which results should I trust? I read the transcriptome path is usually more accurate, but perhaps it doesn't hurt to be a bit conservative?
Many thanks:), Marcos
I suggest you read the papers of the pseudoalignment tools such as kallisto and salmon plus the recent papers that benchmark these different pipelines. Currently the field seems to prefer the pseudoalignment methods. Details in the papers.
PipelineA probably gives you more total detected genes and more "counts" per gene so that may explain higher number of differentially expressed genes.
Regardless, your overlap is very high.
As an additional comment: I recommend checking out
sleuth
for performing differential gene expression analysis withkallisto
.