Entering edit mode
7.8 years ago
LuisNagano
▴
90
Hello everyone, I was wondering if there was a way for me to run Cufflinks/Cuffdiff pipeline, so that the output has, besides gene names, the Ensembl id too? (Or Refseq id, Entrez id)
I ran the tophat using Ensembl bowtie2index, "GRCm38_MusMusculus", and I used "gencode.vM12.annotation.gtf" as reference annotation to run Cufflinks and Cuffmerge, but I only get it in my final cuffdiff output (gene_exp.diff), the gene names and XLOC ids, like this:
Thank you for help!
I used (.gtf) of Ensembl and UCSC to run cufflinks and cuffmerge, but it did not work anyway, I just have XLOC ids in my cuffdiff output. Ensembl ids and Nearest_Ref ids only appear on my cufflinks and cuffmerge output, but they disappear on my final output CuffDiff, which I need.
What command line do you use to run cufflinks, cuffmerge and cuffdiff?
My cufflinks output: transcripts.gtf
My cuffmerge output: merged.gtf
My cuffdiff output: gene_exp.diff
My cuffdiff output: genes.fpkm_tracking
Cuffmerge assigned new IDs to your genes because they are merged and changed in the output. Because the genes' locations are edited, they are not equivalent to the Ensembl genes anymore, so, Ensembl IDs should not be used in those cases.
Usually, when I use cuffmerge, I simply want to find novel genes. In such a case, I simply use bedtools to find genes that do not overlap with the Ensembl gtf file. So, I never needed to match their IDs.
In your case, maybe you can simply append the novel genes (obtained by bedtools intersect -v argument) onto the ensembl gtf, then use it for cuffdiff?
I had a similar problem of not getting appropriate gene ID in my cuffdiff output; check the following post;
How to retrieve gene ID, gene description, after cuffdiff analysis
may be the suggestions given therein could be of help to you