Question

No differential gene expression after tuxedo protocol

0

Entering edit mode

7.3 years ago

vinayjrao ▴ 250

Hi, I'm analyzing some RNA-Seq data using the old tuxedo protocol (tophat, cufflinks, cuffmerge and cuffdiff). I checked the cufflinks output (transcripts.gtf) and found that there is an expression value (fpkm > 0) for all genes, although after cuffdiff, this is not the case. A lot of the genes have an fpkm of zero, because of which I get no differential expression. I have 4 different samples, so I expect to see at least some differential expression.

Thanks

RNA-Seq cuffdiff • 2.6k views

ADD COMMENT • link updated 7.3 years ago by Buffo ★ 2.4k • written 7.3 years ago by vinayjrao ▴ 250

0

Entering edit mode

Do you really want to use the tuxedo pipeline? It hasn't been anywhere near best practice for a number of years. What species is this?

ADD REPLY • link 7.3 years ago by Devon Ryan 104k

0

Entering edit mode

I agree with @Devon. However, your samples seem to suffer from another issue here. can you check following

Total number of reads. For an organism of the size of human, you will need at least 15-20 million reads per replicate.
You have at least 3 biological replicates of each sample if you are using cuffdiff with default setting.
You are aligning the reads to the correct organism.

Best,

ADD REPLY • link 7.3 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

I'm currently trying hisat2 protocol, but haven't finished it yet, so I couldn't use it. And regarding the questions you asked, the number of reads are > 20M in each case; I have 2 biological replicates; the reference genome is hg38, downloaded from iGenomes

ADD REPLY • link 7.3 years ago by vinayjrao ▴ 250

0

Entering edit mode

Okay. Hows alignment percentage? Did you try uploading bam files onto the browser? How do the reads look?

Its not unusual that FPKM values in cufflinks output file are different from cuffdiff output.

ADD REPLY • link 7.3 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

The alignment percentage is >90% in each case, I however haven't tried loading the bam files onto a browser. I will try that now with igv.

Thanks.

ADD REPLY • link 7.3 years ago by vinayjrao ▴ 250

0

Entering edit mode

More up to date differential expression pipeline exist (DESeq2, limma-voom, kalisto+sleut, etc).

When you say you have 4 different samples do you mean 4 different conditions you want to compare with one sample in each or 4 samples per conditions?

ADD REPLY • link 7.3 years ago by VHahaut ★ 1.2k

0

Entering edit mode

Dear Radek, I have 2 cell line and 2 animal models, both with a cancer and a normal data set. I want the results between cancer and normal of cell line and animal separately. I will not be considering the other comparisons.

I'm not using DESeq2 protocol because for some reason bedtools coverage has constantly given me 0 reads mapping. I got a lot of suggestions to correct it, but unfortunately nothing worked. It would be very helpful if you could share a pipeline with me with all the scripts.

Thanks.

ADD REPLY • link 7.3 years ago by vinayjrao ▴ 250

1

Entering edit mode

If you see the mapping of the reads (with IGV for example) and bedtools is not working you could use featureCounts.

If I had to use DESeq2 from scratch I would start there Bioconductor: Differential expression. It also includes a guide to create your matrix of count.

ADD REPLY • link 7.3 years ago by VHahaut ★ 1.2k

0

Entering edit mode

May be you want to read this: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown, actually tophat is obsolete software and kallisto I think is useful just for very curated genomes (and annotations).

ADD REPLY • link 7.3 years ago by Buffo ★ 2.4k