Entering edit mode
7.2 years ago
dazhudou1122
▴
140
Dear BioStars Community,
I am a beginner of pathways analysis, and I ran into some issue using gage and pathview.
I mapped the RNA-seq (mouse cells) using STAR and counted the feature using featureCounts:
featureCounts -T 16 -t exon -g gene_id -a /qbrc/biodata/igenome/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf *.sam
So the row names are gene_ids. I then ran DESeq2, gage and pathview, but I encountered the following error:
Info: Downloading xml files for mmuNA, 1/1 pathways..
Warning: Download of mmuNA xml file failed!
This pathway may not exist!
Info: Downloading png files for mmuNA, 1/1 pathways..
Warning: Download of mmuNA png file failed!
This pathway may not exist!
Warning: Failed to download KEGG xml/png files, mmuNA skipped!
Warning messages:
1: In download.file(xml.url, xml.target, quiet = T) :
URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
2: In download.file(xml.url, xml.target, quiet = T) :
URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
3: In download.file(xml.url, xml.target, quiet = T) :
URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
I thought maybe my gene_id was the issue, so I tried conversion:
id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")
But that did not work well:
'select()' returned 1:1 mapping between keys and columns
[1] "Note: 24046 of 24062 unique input IDs unmapped."
Can anyone help me solve the issue? Any input will be appreciated!
Best,
Wenhan
The codes I ran:
## Run DESeq normalization
countdata <- read.table("B6_vs_PPARA_count.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("ctl", 3), rep("exp", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)
##from GAGE
deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc
out.suffix="deseq2"
require(gage)
datakegg.gs)
#get the annotation files for mouse
kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]
#convert gene symbol to entrez ID
gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')
names(exp.fc)<- gene.symbol.eg[,2]
fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(gene.data = exp.fc, pathway.id = pid,species = "mmu", out.suffix=out.suffix))
Have you read carefully the messages ? Do you know what mmuNA is ? Then if you think the gene IDs are the issue, why don't you show us what they are ? Could it be they are not what you think they are or what the software expects ?
I tried but I cant seem to find what mmuNA is....
Here are some examples of the GeneID I have: Geneid Xkr4 Rp1 Sox17 Mrpl15 Lypla1 Tcea1 Rgs20 Atp6v1h Oprk1 Npbwr1 Rb1cc1 Fam150a
mmuNA looks suspiciously like the concatenation of mmu with NA, the symbol used for undefined/missing values in R. Investigating this might put you on the right track.
Please use the
Code
button to make your post more readable.Thank you! I have modified accordingly:)
could you paste
head (rownames(dds),10)
here?Here it is:
[1] "Xkr4" "Rp1" "Sox17" "Mrpl15" "Lypla1" "Tcea1" "Rgs20" "Atp6v1h" "Oprk1" "Npbwr1"
Thank you:)
can you try this and let us know:
Let us know the ouput differences between:
and
Dear cpad0112:
Thank you so much for helping!
Here are the results:
However, I still got the same error....
It is weird that path.ids, path.ids.l and path.ids2 are all empty. This data set has more than 100 gens that are very significantly differently expressed...
after running this?
[1] "Note: 8779 of 24062 unique input IDs unmapped."
earlier, ids were not getting mapped and with toupper, ~8800 ids got mapped. I guess you need to run the script with subset of your data rather than entire data and fix the code. I think you are following tutorial here: https://www.r-bloggers.com/tutorial-rna-seq-differential-expression-pathway-analysis-with-sailfish-deseq2-gage-and-pathview/. Try to emulate the same without much code changes to mouse. Example code works for humans and with some example data. `
Why is your organism set to
mmu
in one place butMm
in another?I tried, I used mmu, but no database was downloaded. And if you use mmu as species, then the program does not recognize...