Question

Pathway analysis gene name issue

0

Entering edit mode

7.2 years ago

dazhudou1122 ▴ 140

Dear BioStars Community,

I am a beginner of pathways analysis, and I ran into some issue using gage and pathview.

I mapped the RNA-seq (mouse cells) using STAR and counted the feature using featureCounts:

featureCounts -T 16 -t exon -g gene_id -a /qbrc/biodata/igenome/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf *.sam

So the row names are gene_ids. I then ran DESeq2, gage and pathview, but I encountered the following error:

Info: Downloading xml files for mmuNA, 1/1 pathways..

Warning: Download of mmuNA xml file failed!
This pathway may not exist!
Info: Downloading png files for mmuNA, 1/1 pathways..

Warning: Download of mmuNA png file failed!
This pathway may not exist!
Warning: Failed to download KEGG xml/png files, mmuNA skipped!
Warning messages:
1: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
2: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
3: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'

I thought maybe my gene_id was the issue, so I tried conversion:

 id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")

But that did not work well:

'select()' returned 1:1 mapping between keys and columns
[1] "Note: 24046 of 24062 unique input IDs unmapped."

Can anyone help me solve the issue? Any input will be appreciated!

Best,

Wenhan

The codes I ran:

## Run DESeq normalization
countdata <- read.table("B6_vs_PPARA_count.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("ctl", 3), rep("exp", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)

##from GAGE

deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc
out.suffix="deseq2"

require(gage)
datakegg.gs)

#get the annotation files for mouse

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]

#convert gene symbol to entrez ID

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(gene.data = exp.fc, pathway.id = pid,species = "mmu", out.suffix=out.suffix))

RNA-Seq pathway analysis gene name issue mouse • 5.4k views

ADD COMMENT • link 7.2 years ago by dazhudou1122 ▴ 140

2

Entering edit mode

Have you read carefully the messages ? Do you know what mmuNA is ? Then if you think the gene IDs are the issue, why don't you show us what they are ? Could it be they are not what you think they are or what the software expects ?

ADD REPLY • link 7.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I tried but I cant seem to find what mmuNA is....

Here are some examples of the GeneID I have: Geneid Xkr4 Rp1 Sox17 Mrpl15 Lypla1 Tcea1 Rgs20 Atp6v1h Oprk1 Npbwr1 Rb1cc1 Fam150a

ADD REPLY • link 7.2 years ago by dazhudou1122 ▴ 140

1

Entering edit mode

mmuNA looks suspiciously like the concatenation of mmu with NA, the symbol used for undefined/missing values in R. Investigating this might put you on the right track.

ADD REPLY • link 7.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Please use the Code button to make your post more readable.

101010 Button

ADD REPLY • link 7.2 years ago by h.mon 35k

0

Entering edit mode

Thank you! I have modified accordingly:)

ADD REPLY • link 7.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

could you paste head (rownames(dds),10) here?

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

Here it is:

[1] "Xkr4" "Rp1" "Sox17" "Mrpl15" "Lypla1" "Tcea1" "Rgs20" "Atp6v1h" "Oprk1" "Npbwr1"

Thank you:)

ADD REPLY • link 7.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

can you try this and let us know:

id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

Let us know the ouput differences between:

id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")

and

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

Dear cpad0112:

Thank you so much for helping!

Here are the results:

id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
[1] "Note: 8779 of 24062 unique input IDs unmapped."
 [ reached getOption("max.print") -- omitted 23562 rows ]

id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")
[1] "Note: 24046 of 24062 unique input IDs unmapped."

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
[1] "Note: 8779 of 24062 unique input IDs unmapped."

However, I still got the same error....

ADD REPLY • link 7.2 years ago by dazhudou1122 ▴ 140

0

Entering edit mode

It is weird that path.ids, path.ids.l and path.ids2 are all empty. This data set has more than 100 gens that are very significantly differently expressed...

ADD REPLY • link 7.2 years ago by dazhudou1122 ▴ 140

1

Entering edit mode

after running this?

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

[1] "Note: 8779 of 24062 unique input IDs unmapped."

earlier, ids were not getting mapped and with toupper, ~8800 ids got mapped. I guess you need to run the script with subset of your data rather than entire data and fix the code. I think you are following tutorial here: https://www.r-bloggers.com/tutorial-rna-seq-differential-expression-pathway-analysis-with-sailfish-deseq2-gage-and-pathview/. Try to emulate the same without much code changes to mouse. Example code works for humans and with some example data. `

ADD REPLY • link 7.2 years ago by cpad0112 21k

0

Entering edit mode

Why is your organism set to mmu in one place but Mm in another?

ADD REPLY • link 7.2 years ago by igor 13k

0

Entering edit mode

I tried, I used mmu, but no database was downloaded. And if you use mmu as species, then the program does not recognize...

ADD REPLY • link 7.2 years ago by dazhudou1122 ▴ 140