Pathway analysis gene name issue
0
0
Entering edit mode
7.2 years ago
dazhudou1122 ▴ 140

Dear BioStars Community,

I am a beginner of pathways analysis, and I ran into some issue using gage and pathview.

I mapped the RNA-seq (mouse cells) using STAR and counted the feature using featureCounts:

featureCounts -T 16 -t exon -g gene_id -a /qbrc/biodata/igenome/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf *.sam

So the row names are gene_ids. I then ran DESeq2, gage and pathview, but I encountered the following error:

Info: Downloading xml files for mmuNA, 1/1 pathways..

Warning: Download of mmuNA xml file failed!
This pathway may not exist!
Info: Downloading png files for mmuNA, 1/1 pathways..

Warning: Download of mmuNA png file failed!
This pathway may not exist!
Warning: Failed to download KEGG xml/png files, mmuNA skipped!
Warning messages:
1: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
2: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'
3: In download.file(xml.url, xml.target, quiet = T) :
  URL 'http://rest.kegg.jp/get/mmuNA/kgml': status was '400 Bad Request'

I thought maybe my gene_id was the issue, so I tried conversion:

 id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")

But that did not work well:

'select()' returned 1:1 mapping between keys and columns
[1] "Note: 24046 of 24062 unique input IDs unmapped."

Can anyone help me solve the issue? Any input will be appreciated!

Best,

Wenhan

The codes I ran:

## Run DESeq normalization
countdata <- read.table("B6_vs_PPARA_count.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("ctl", 3), rep("exp", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)

##from GAGE

deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc
out.suffix="deseq2"

require(gage)
datakegg.gs)

#get the annotation files for mouse

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]

#convert gene symbol to entrez ID

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(gene.data = exp.fc, pathway.id = pid,species = "mmu", out.suffix=out.suffix))
RNA-Seq pathway analysis gene name issue mouse • 5.4k views
ADD COMMENT
2
Entering edit mode

Have you read carefully the messages ? Do you know what mmuNA is ? Then if you think the gene IDs are the issue, why don't you show us what they are ? Could it be they are not what you think they are or what the software expects ?

ADD REPLY
0
Entering edit mode

I tried but I cant seem to find what mmuNA is....

Here are some examples of the GeneID I have: Geneid Xkr4 Rp1 Sox17 Mrpl15 Lypla1 Tcea1 Rgs20 Atp6v1h Oprk1 Npbwr1 Rb1cc1 Fam150a

ADD REPLY
1
Entering edit mode

mmuNA looks suspiciously like the concatenation of mmu with NA, the symbol used for undefined/missing values in R. Investigating this might put you on the right track.

ADD REPLY
0
Entering edit mode

Please use the Code button to make your post more readable.

101010 Button

ADD REPLY
0
Entering edit mode

Thank you! I have modified accordingly:)

ADD REPLY
0
Entering edit mode

could you paste head (rownames(dds),10) here?

ADD REPLY
0
Entering edit mode

Here it is:

[1] "Xkr4" "Rp1" "Sox17" "Mrpl15" "Lypla1" "Tcea1" "Rgs20" "Atp6v1h" "Oprk1" "Npbwr1"

Thank you:)

ADD REPLY
0
Entering edit mode

can you try this and let us know:

id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

Let us know the ouput differences between:

id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")

and

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
ADD REPLY
0
Entering edit mode

Dear cpad0112:

Thank you so much for helping!

Here are the results:

id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
[1] "Note: 8779 of 24062 unique input IDs unmapped."
 [ reached getOption("max.print") -- omitted 23562 rows ]

id.map.refseq <- id2eg(ids = rownames(dds), category ="SYMBOL", org = "mmu")
[1] "Note: 24046 of 24062 unique input IDs unmapped."

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")
[1] "Note: 8779 of 24062 unique input IDs unmapped."

However, I still got the same error....

ADD REPLY
0
Entering edit mode

It is weird that path.ids, path.ids.l and path.ids2 are all empty. This data set has more than 100 gens that are very significantly differently expressed...

ADD REPLY
1
Entering edit mode

after running this?

id.map.refseq <- id2eg(ids = toupper(rownames(dds)), category ="SYMBOL", org="mmu")

[1] "Note: 8779 of 24062 unique input IDs unmapped."

earlier, ids were not getting mapped and with toupper, ~8800 ids got mapped. I guess you need to run the script with subset of your data rather than entire data and fix the code. I think you are following tutorial here: https://www.r-bloggers.com/tutorial-rna-seq-differential-expression-pathway-analysis-with-sailfish-deseq2-gage-and-pathview/. Try to emulate the same without much code changes to mouse. Example code works for humans and with some example data. `

ADD REPLY
0
Entering edit mode

Why is your organism set to mmu in one place but Mm in another?

ADD REPLY
0
Entering edit mode

I tried, I used mmu, but no database was downloaded. And if you use mmu as species, then the program does not recognize...

ADD REPLY

Login before adding your answer.

Traffic: 1898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6