Entering edit mode
7.3 years ago
vinayjrao
▴
250
Hi, I'm analyzing some RNA-Seq data using the old tuxedo protocol (tophat, cufflinks, cuffmerge and cuffdiff). I checked the cufflinks output (transcripts.gtf) and found that there is an expression value (fpkm > 0) for all genes, although after cuffdiff, this is not the case. A lot of the genes have an fpkm of zero, because of which I get no differential expression. I have 4 different samples, so I expect to see at least some differential expression.
Thanks
Do you really want to use the tuxedo pipeline? It hasn't been anywhere near best practice for a number of years. What species is this?
I agree with @Devon. However, your samples seem to suffer from another issue here. can you check following
cuffdiff
with default setting.Best,
I'm currently trying hisat2 protocol, but haven't finished it yet, so I couldn't use it. And regarding the questions you asked, the number of reads are > 20M in each case; I have 2 biological replicates; the reference genome is hg38, downloaded from iGenomes
Okay. Hows alignment percentage? Did you try uploading bam files onto the browser? How do the reads look?
Its not unusual that FPKM values in cufflinks output file are different from cuffdiff output.
The alignment percentage is >90% in each case, I however haven't tried loading the bam files onto a browser. I will try that now with igv.
Thanks.
More up to date differential expression pipeline exist (DESeq2, limma-voom, kalisto+sleut, etc).
When you say you have 4 different samples do you mean 4 different conditions you want to compare with one sample in each or 4 samples per conditions?
Dear Radek, I have 2 cell line and 2 animal models, both with a cancer and a normal data set. I want the results between cancer and normal of cell line and animal separately. I will not be considering the other comparisons.
I'm not using DESeq2 protocol because for some reason bedtools coverage has constantly given me 0 reads mapping. I got a lot of suggestions to correct it, but unfortunately nothing worked. It would be very helpful if you could share a pipeline with me with all the scripts.
Thanks.
If you see the mapping of the reads (with IGV for example) and bedtools is not working you could use featureCounts.
If I had to use DESeq2 from scratch I would start there Bioconductor: Differential expression. It also includes a guide to create your matrix of count.
May be you want to read this: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown, actually tophat is obsolete software and kallisto I think is useful just for very curated genomes (and annotations).