One of my colleague outsourced the data analysis and they followed below method in finding differential expression analysis
First they did, denovo assembly of controlA, controlB, treatedA and treated B separtely. Assembled transcripts from all the samples were iteratively clustered to produce uni-transcripts with fewest redudant sequence (they did not mention how they clustered to produce uni-transcripts). Then pre-processed reads were then mapped to the uni-transcripts to carry out the expression analysis and differential expression analysis using tophat-cufflinks.
they followed -clustering parameters
- minimum identity for overlaps: 96%
- minimum overlap length: 50bp
- maximum length of unmatched overhangs: 50bp
they followed -uni-transcipt is considered differentially expressed if fold change > or equal to 4 and q-value less than 0.05
expression detected only in one sample condition and qvalue less than 0.05
From my experience, I usually do denovo assemble all the samples (control A, Control B, treatedA and treated B) as one reference transcript assembly. Then I do read map each sample (controlA, controlB and treated A and treated B) and get for each sampe FPKM values. Then do differential expression analysis taking all read mapped ones in any one of differential expression software (edgeR, deseq).
Do you think my colleague outsourced method is correct? If it is correct, why they want to do clustering?
On a small side note, when you only have 2 samples in each group, I would be very vary of any findings you make. The statistical power of such a sample size is simply too low to make any decent conclusions.