Question

I'm facing a problem to analyze microarray data from Illumina HumanHT-12

0

Entering edit mode

7.1 years ago

Leite ★ 1.3k

Hello everyone,

I'm facing a problem to analyze microarray data from Illumina HumanHT-12 from public databases such as E-MTAB-5273 and GSE54514.

E-MTAB-5273 have two files E-MTAB-5273.raw.1.zip and E-MTAB-5273.processed.1.zip both .txt,

GSE54514 also have two filesGSE54514_RAW.tar and GSE54514_non-normalized.txt.gz - one .bgx and other .txt respectively.

First question: E-MTAB-5273.raw file represents non-normalized data? While processed file, represents normalized?

Second question: What is the best way to analyze non-normalized Illumina HumanHT-12?

Best regards,

Leite

R Illumina HumanHT-12 • 2.7k views

ADD COMMENT • link 7.1 years ago by Leite ★ 1.3k

0

Entering edit mode

Dears colleagues,

I found some answers:

First question: E-MTAB-5273.raw file represents non-normalized data? While processed file, represents normalized? Yes the .raw file in E-MTAB represent non-normalized, as well as processed is the normalized file loaded in this database. So, .bgf is a manifest file ( "Describe the contents of each microarray, including the probe names and sequences among many other things").

Second question: What is the best way to analyze non-normalized Illumina HumanHT-12?

#read in the expression profiles
x <- read.ilmn("Burnham_sepsis_discovery_raw_237.txt", probeid="PROBE_ID", other.columns="detection")

#Background correction and Normalization
y <- neqc(x)
dim(y)

My question is how to tell R which are controls and patients to then do the design matrix and find the DEGs?

ADD REPLY • link 7.1 years ago by Leite ★ 1.3k

1

Entering edit mode

You now just have to perform differential expression analysis on the normalised log2 intensities contained in the y object. To determine which samples are patients and controls, just consult the metadata. Fo example, the information on patients and controls can be found here for GSE54514: https://www.ncbi.nlm.nih.gov/geo/geo2r/?acc=GSE54514

To implement this in R, you just have to create the model matrix and ensure that the model matrix rows correspond to your Expression Set columns.

As both studies used the same microarray type, you can merge the raw data files together and then normalise them together. You may notice a batch effect, but you can adjust for this in the limma design model.

ADD REPLY • link 7.1 years ago by Kevin Blighe 89k

0

Entering edit mode

Dear Kevin,

I found it's answer in this post https://support.bioconductor.org/p/92834/ by Gordon Smyth, but I still don't understand how he did to say what samples are"controls" and are "patients".

> library(limma)
> x <- read.ilmn("GSE74629_non-normalized.txt",expr="SAMPLE ",probeid="ID_REF")
Reading file GSE74629_non-normalized.txt ... ...
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the
detection p-values.
> Group <- rep(c("PDAC","Healthy"),c(36,14))
> Group <- factor(Group)
> design <- model.matrix(~Group)
> keep <- rowSums(y$E>5) >= 14
> y2 <- y[keep,]
> fit <- lmFit(y2,design)
> fit <- eBayes(fit,trend=TRUE,robust=TRUE)
> topTable(fit,coef=2)

Best, Leite

ADD REPLY • link 7.0 years ago by Leite ★ 1.3k