using cufflinks when investigating differential expression
3
3
Entering edit mode
9.7 years ago
sangita_b ▴ 90

I have used the TopHat-Cufflinks-Cuffdiff pipeline to carry RNAseq analysis (looking at differential gene expression).

The experiment has been carried in two donors, and twice for each donor giving a total of 4 replicates for each condition. There are 5 conditions and therefore 20 samples in total that have bee sequenced.

After using cuffdiff we found that we do not see genes significantly differentially expressed (according the corrected q value that the software calculates). The only significant results (q < 0.05) that I see are for one of the conditions- and this is the case for less than 20 genes.

Can anyone suggest how I can interpret my data/ re-think how to analyse. Perhaps using packages that take into account biological variation? (My guess is that variation in my sample is affecting the p/ q values. I have run cufform for all samples- the FPKM values for replicates are quite variable). Or can I use the p values or the fold changes to inform of genes that are differentially expressed between samples?

Thanks

RNA-Seq differential-expression • 3.3k views
ADD COMMENT
0
Entering edit mode

1. Try tophat2-HtseqCcount-DESeq/edgeR pipeline.

ADD REPLY
0
Entering edit mode

Thanks Geek_y.

Can I also ask if you know of programs that concatenate/ combine fastq files?

Thanks

ADD REPLY
1
Entering edit mode

Just use the LINUX cat command. Should be enough for you to cat them together.

ADD REPLY
2
Entering edit mode
9.7 years ago
Jordan ★ 1.3k

The tuxedo suite you have used does consider biological variability. And I don't think it's right to change methods just because you think there is biological variability. This biological variability might be true. Another problem might also be that, you are using only two replicates. Ideally you would like to have three or more to represent true biological variability.

Two see if the analysis was performed correctly, look at a few control genes. I mean, look at the FPKM values of few genes, you know for sure should be expressed high or low based on your experiment.

Alternatively, you can also try using cummerbund. A data visualization tool by cufflinks. It's fairly simple to use and they have a good documentation.

ADD COMMENT
2
Entering edit mode
9.7 years ago

Here's my 2 cents.... You've got a slightly more complex design than the tuxedo pipeline is able to handle (in my experience), that is, if your samples are from two different donors, I'd expect one of the major sources of variability in your samples, to be the 'donor effect' - which you can look for using a PCA plot.

​Providing that the main source of variation is donor, then I'd do as @Geek_y suggests, Tophat, htseq_count, DESeq2. The main reason for this is the fact that you'd need to account for Donor in your model design and as far as I'm aware, the model design in DESeq2 is much more flexible than Tuxedo.

If you're now thinking "Can I do 'Novel Discovery' with this method?", the answer would be sort of. You could do Tophat, Cufflinks, Cuffmerge, then use that GTF as your reference for HTSeq_Count, but that's a whole other area for discussion.

ADD COMMENT
1
Entering edit mode
9.7 years ago

Hi,

To merge several fastq files, you can use:

cat input1.fq input2.fq > output.fq
ADD COMMENT

Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6