I am trying to implement a pipeline for RNA-seq Analysis using the Tuxedo suite of tools. I am planning on doing the following Cufflinks ---> CuffMerge ---> CuffDiff. The analysis I am trying to conduct contains 2 different conditions with two samples each for a total of four files.
Condition 1: Sample 1 & 2 Condition 2: Sample 3 & 4
My question is for the CuffMerge step should I be merging all four samples or just do merges within the conditions?
All 4 of these samples are from the same cell line so I assume that the best practice would be to merge all 4 samples and then use CuffDiff to calculate the FPKM values at the condition and sample level. Is this a correct assumption?
You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using kallisto or salmon.