Hi all,
I am now using CummeRbund in R (version 3.2.4) to visualize my Cuffdiff (version 2.2.1) outputs.
I was hoping to get short names such as MMP9
... for my genes. However, I can only get NCBI Reference Sequence such as NM_001002930 NM_001002932 NM_001002938
...etc.
Here is my R script:
cuff_data <- readCufflinks(dir=refCuffdiff,rebuild=T,gtfFile=gtfFilePath,genome=genomePath)
diffGeneIDs <- getSig(cuff_data,level="genes",alpha=0.01)
diffGenes<-getGenes(cuff_data,diffGeneIDs)
featureNames(diffGenes)
Here is part of my output:
840 XLOC_024368 <NA>
841 XLOC_024378 <NA>
842 XLOC_024418 <NA>
843 XLOC_024432 <NA>
844 XLOC_024434 <NA>
845 XLOC_024442 <NA>
846 XLOC_024444 NM_001003241
847 XLOC_024451 NM_001003219
848 XLOC_024474 NM_001197143
849 XLOC_024482 <NA>
850 XLOC_024503 NR_128749
(1) I was wondering why most of are <na>. Is it caused by the lack of annotation in dog genome (CanFam3.1.gtf)? Or it has something to do with the addFeatures
function?
(2) For the sample that has an ID, I was hoping to get short names instead of the NCBI Reference Sequence. How can I achieve my goal?
I have checked this post on SEQanswer but it couldn't solve my problem.
Thanks a lot!!!
The problem is coming from your annotation file that you provided to Cuffdiff. Check whether it has the gene names there? For a quick test, do
head gene_exp.diff
inside the folder where you store the cuffdiff results, I reckon there are no gene names there only.Yes. You are right. I have been working with dog genome, and I downloaded the gtf file from UCSC. The
head gene_exp.diff
result looks like this:Does that mean most of the gene have not been annotated?