Hello,
I ran cuffdiff with 3 replicates per condition, trying to find differentially expressed genes. Once I got the results I realized that if one of my replicates is extremely high or extremely low (as if it was just an irregular sample) it diverts the mean of the reps and call it differentially expressed. Is cuffdiff ignore STD? Is there a way to change it? Are the alternative software (EdgeR, DeSeq) better with large STD replicates?
For example: in the graph I was looking for the differential expression between the red and the blue samples, it is clear to see that without the extremely high blue replicate- I would say that there is no difeerential expression or if there is the red samples are higher. Yet, cuffdiff results says that the gene express differentially between the two conditions and that it is higher in the blue samples with very large fold of change (I'm aware to the low expression levels- it's just one example).
Does anyone have an idea how to overcome this problem?
While I am not sure how exactly Cuffdiff treats replicates, you can use the flag
–max-bundle-frags
to skip overly abundant reads. You should be able to get a read count table from the bam file generated earlier to see what number you want to set as the cut-off threshold.