How Can The Same Gene Be Both Significantly Up- And Down-Regulated According To The Gxa?
1
3
Entering edit mode
12.3 years ago
Neilfws 49k

I've been playing around with the EBI Gene Expression Atlas (GXA). It has an API. So, for example, I can retrieve data about the human gene SRI in JSON format using this URI:

http://www.ebi.ac.uk:80/gxa/api/vx?geneIs=ENSG00000075142&format=json

I wrote some R code to fetch/parse the JSON into a data frame:

library(RCurl)
library(rjson)
library(plyr)

j2df <- function(l) {
  e <- lapply(l$results[[1]]$expressions, function(x) {
    ef  <- x$ef
    efv <- x$efv
    updn <- sapply(x$experiments, function(y) {
      y$updn
    })
    pval <- sapply(x$experiments, function(y) {
      y$pvalue
    })
    accn <- sapply(x$experiments, function(y) {
      y$experimentAccession
    })
    list(ef = ef, efv = efv, accn = accn, updn = updn, pvalue = pval)
  }
              )
  e <- ldply(e, as.data.frame)
  return(e)
}

# fetch the JSON
j <- fromJSON(getURL("http://www.ebi.ac.uk:80/gxa/api/vx?geneIs=ENSG00000075142&format=json"))
# convert to data frame
sri <- j2df(j)

When I examine the first few rows, I see:

head(sri)
         ef   efv      accn updn pvalue
1 cell_line   1A2 E-MTAB-37 DOWN  0.000
2 cell_line 22Rv1 E-MTAB-37   UP  0.003
3 cell_line 22Rv1 E-MTAB-37 DOWN  0.019
4 cell_line  5637 E-MTAB-37   UP  0.000
5 cell_line  647V E-MTAB-37   UP  0.009
6 cell_line  769P E-MTAB-37   UP  0.000

According to rows 2 and 3, the same gene (SRI) in the same experiment (E-MTAB-37) is both up-regulated (p = 0.003) and down-regulated (p = 0.019) in cell line 22Rv1, as compared with mean expression from all cell lines. At least, that is my understanding of UP and DOWN as defined in the GXA documentation.

Am I missing something obvious? Or are the data returned by the GXA API simply nonsense?

microarray database • 2.7k views
ADD COMMENT
6
Entering edit mode
12.3 years ago
Neilfws 49k

Let me answer my own question.

We can view the gene and experiment at this link. If we then select cell line 22Rv1 under conditions and refresh, we see that there are 2 probes (or "design elements") for the gene on this array. The measurement for 208920_at is UP and that for 208921_s_at is DOWN.

ADD COMMENT
2
Entering edit mode

Using a custom cdf like the ones from brainarray (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomiccuratedCDF.asp), where all probes targeting the same gene would be combines should prevent this problem. Of course the different probes could also target different transcripts for the same gene which would give a biological explanation for what you found.

ADD REPLY
0
Entering edit mode

randomly clicking around and selecting various cell lines one can find other similar examples: D341Med, Detroit562, H4, HPAFII where the designations don't match. Yet have really high p-values, (D341Med has p-values of E-7 and E-10 indicating opposing behaviors) in many other cases one of the p-values is ridiculously low 1E-10 whereas the other is non-defined.

in a way demonstrates the utility (or lack thereof) of p-values

ADD REPLY

Login before adding your answer.

Traffic: 3073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6