Question

Extract Expression Data For Candidate Gene List In Microarray

7

Entering edit mode

14.0 years ago

Kerem ▴ 80

Hi all,

I'm struggling my way through beginner's Bioconductor (and R in general) and have a specific problem I hope somebody can help with. I can load CEL files, do RMA processing on them and check QC etc, eventually I end up with a data frame with these data.

I have a few lists of about 200 candidate genes each that are of specific interest to us and I would like to run a script to print the expression data for just those genes to a tab-delim txt file.

Can anybody point me in the right direction to do this. I do have the Affymetrix Probe Set IDs for these, so can use either that or gene symbol.

Apologies if this is a very basic question, I'm very new to this.

Thanks in advance, Kerem.

microarray bioconductor r • 13k views

ADD COMMENT • link updated 11.5 years ago by Biostar 20 • written 14.0 years ago by Kerem ▴ 80

0

Entering edit mode

In addition to the answers you might find this document useful: http://cran.r-project.org/doc/manuals/R-data.pdf

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 14.0 years ago by Michael 55k

Ram · Answer 1 · 2011-02-22

7

Entering edit mode

14.0 years ago

Christof Winter ★ 1.1k

After RMA processing, you should end up with an ExpressionSet object. Let's assume it's called eset, and your probe set IDs are in a character vector called yourProbeSetIDs. Then:

exprs(eset)    # gives a dataframe with expression levels, probe set IDs are the row names
exprs(eset)[yourProbeSetIDs, ]  # gives you of subset dataframe with your probe sets of interest

To write a tab-delimited file:

write.table(yourDataFrame, file="filename.txt", sep="\t", quote=F, row.names=T)

Christof

ADD COMMENT • link updated 5.4 years ago by Ram 44k • written 14.0 years ago by Christof Winter ★ 1.1k

1

Entering edit mode

It does work. Have a look at indexing with character vectors: http://cran.r-project.org/doc/manuals/R-intro.html#Index-vectors

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 14.0 years ago by Christof Winter ★ 1.1k

1

Entering edit mode

library(oligo)
geneCELs <- list.celfiles(path, full.names=TRUE)
affyGeneFS <- read.celfiles(geneCELs)
genePS <- rma(affyGeneFS, target="probeset")
featureData(genePS) <- getNetAffx(genePS, "probeset")

so I can do exprs(genePS) and get the probesets, however I want the original probes, before they are "summarised" to the probe-set level.

So something like:

exprs(affyGeneFS)

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 13.2 years ago by Jimbo ▴ 120

0

Entering edit mode

AFAIK, exprs(eset)[yourProbeSetIDs, ] does not work. It either needs to be probeIndices or a boolean array of length(exprs) between the square brackets.

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 14.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Dear both, Firstly, massive thanks for your help, really appreciated. Eventually got exprs(eset)[yourProbeSetIDs, ] to work. My fault - needed to make the char vector correctly and include the trailing ", " or else it fails. Very useful for me.

However, I've tested with 2 probeIDs and the output file has a couple of problems. First, the header row is missing column title for probeID. Maybe hints I've imported the data wrongly? Second, I want to include gene symbols for the rows. I've got and installed the appropriate .db for my mouse gene st chips. Any pointers?

Thanks again, K.

ADD REPLY • link 14.0 years ago by Kerem ▴ 80

0

Entering edit mode

Didnt know that point 4 existed, thanks :) @Kerem: you may either reset the headers with colnames()= or take the subset from you dataframe instead of the expression data which would (I assume) preserve the headers. For the 2nd, see the annaffy documentation.

ADD REPLY • link 14.0 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

For mouse, also have a look at the org.Mm.eg.db package.

ADD REPLY • link 14.0 years ago by Christof Winter ★ 1.1k

0

Entering edit mode

Anyone know how to get the probe set ids for the expression matrix row names BEFORE summarisation (so like 14029281_1, 14029281_2 etc.). When I load the cel files into R, the original unnormalised expression matrix has row names from 1:number of probes.

ADD REPLY • link 13.2 years ago by Jimbo ▴ 120

0

Entering edit mode

Can you post your code how you load and process the CEL files?

ADD REPLY • link 13.2 years ago by Christof Winter ★ 1.1k

0

Entering edit mode

Sorry, that's rather messy, shall I repost this as a new question?

ADD REPLY • link 13.2 years ago by Jimbo ▴ 120

0

Entering edit mode

Reposted here: Getting Probe Ids For Expression Matrix Before Summarisation (In R If Possible)

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 13.2 years ago by Jimbo ▴ 120

Ram · Answer 2 · 2011-02-22

5

Entering edit mode

14.0 years ago

Michael Schubert ★ 7.1k

Here are some pointers:

I assume you know how to get the expression values from your dataframe by using exprs(dataframe)
From the expression values, you can get a vector of gene identifiers for the whole array by using rownames() or colnames() depending on the orientation of your dataframe

Then you need to check if each ID matches one of your list of interest and finally extract the appropriate dataset. When you are new to programming in general you may want to do this with loops, otherwise there are shortcuts. You can try to following commands and make them fit for your situation:

all = c(1:5)
wanted = c(1,4,5)
x = all %in% wanted
all[x]

Hope that helps :)

ADD COMMENT • link updated 5.4 years ago by Ram 44k • written 14.0 years ago by Michael Schubert ★ 7.1k

1

Entering edit mode

Thanks Michael, enjoying the programming and this will come in useful. After I read your reply I found https://wiki.brandeis.edu/twiki/bin/view/Bio/BioConductor which has a subsection about 2/3 of the way down the page titled "Filtering Genes" which seems to use similar principles to do this. Also, I've asked a different question in reply to the answer below about including gene symbols in output. Wonder if you have any pointers on this for me? Thanks, Kerem.

ADD REPLY • link 14.0 years ago by Kerem ▴ 80