Question

using cufflinks when investigating differential expression

3

Entering edit mode

9.7 years ago

sangita_b ▴ 90

I have used the TopHat-Cufflinks-Cuffdiff pipeline to carry RNAseq analysis (looking at differential gene expression).

The experiment has been carried in two donors, and twice for each donor giving a total of 4 replicates for each condition. There are 5 conditions and therefore 20 samples in total that have bee sequenced.

After using cuffdiff we found that we do not see genes significantly differentially expressed (according the corrected q value that the software calculates). The only significant results (q < 0.05) that I see are for one of the conditions- and this is the case for less than 20 genes.

Can anyone suggest how I can interpret my data/ re-think how to analyse. Perhaps using packages that take into account biological variation? (My guess is that variation in my sample is affecting the p/ q values. I have run cufform for all samples- the FPKM values for replicates are quite variable). Or can I use the p values or the fold changes to inform of genes that are differentially expressed between samples?

Thanks

RNA-Seq differential-expression • 3.3k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.7 years ago by sangita_b ▴ 90

0

Entering edit mode

1. Try tophat2-HtseqCcount-DESeq/edgeR pipeline.

ADD REPLY • link 9.7 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks Geek_y.

Can I also ask if you know of programs that concatenate/ combine fastq files?

Thanks

ADD REPLY • link 9.7 years ago by sangita_b ▴ 90

1

Entering edit mode

Just use the LINUX cat command. Should be enough for you to cat them together.

ADD REPLY • link 9.7 years ago by wanziyi89 ▴ 60

score 2 · Answer 1 · 2015-05-05

The tuxedo suite you have used does consider biological variability. And I don't think it's right to change methods just because you think there is biological variability. This biological variability might be true. Another problem might also be that, you are using only two replicates. Ideally you would like to have three or more to represent true biological variability.

Two see if the analysis was performed correctly, look at a few control genes. I mean, look at the FPKM values of few genes, you know for sure should be expressed high or low based on your experiment.

Alternatively, you can also try using cummerbund. A data visualization tool by cufflinks. It's fairly simple to use and they have a good documentation.

Ram · Answer 2 · 2015-05-06

Here's my 2 cents.... You've got a slightly more complex design than the tuxedo pipeline is able to handle (in my experience), that is, if your samples are from two different donors, I'd expect one of the major sources of variability in your samples, to be the 'donor effect' - which you can look for using a PCA plot.

Providing that the main source of variation is donor, then I'd do as @Geek_y suggests, Tophat, htseq_count, DESeq2. The main reason for this is the fact that you'd need to account for Donor in your model design and as far as I'm aware, the model design in DESeq2 is much more flexible than Tuxedo.

If you're now thinking "Can I do 'Novel Discovery' with this method?", the answer would be sort of. You could do Tophat, Cufflinks, Cuffmerge, then use that GTF as your reference for HTSeq_Count, but that's a whole other area for discussion.

Ram · Answer 3 · 2015-05-05

1

Entering edit mode

9.7 years ago

Evgeniia Golovina ★ 1.3k

Hi,

To merge several fastq files, you can use:

cat input1.fq input2.fq > output.fq

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.7 years ago by Evgeniia Golovina ★ 1.3k