The GSE119600 Illumina Beadchip data can be read and processed in a straightforward manner using limma package functions.
Read expression data and detection p-values:
> library(limma)
> x <- read.ilmn("GSE119600_non-normalized.txt.gz",probeid="ID_REF")
Reading file GSE119600_non-normalized.txt.gz ... ...
Background correct and quantile normalize using detection p-values.
Note that limma does not require the control probe expression values because it is able to infer the mean and variance of the control probes from the detection p-values (functionality that was added to the neqc
function in October 2010).
> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the detection p-values.
Parse the sample annotation out of the series matrix file.
> SampleInfo <- sampleInfoFromGEO("GSE119600_series_matrix.txt.gz")$SampleInfo
> Group <- SampleInfo[,"source_name_ch1"]
> table(Group)
Group
Control, adult Crohn’s disease, adult
47 48
Crohn’s disease, child Primary biliary cholangitis, adult
47 90
Primary sclerosing cholangitis, adult Ulcerative colitis, adult
45 45
Ulcerative colitis, child
48
The data is now ready for linear modeling.
hi zelda, i'm having the same problem with raw data from illumina beadchip in GEO. How you solved your problem ?
Hi Zelda, i'm having the same problem... How do you proceed ?
Please elaborate on the problem. Please show what you have already tried, and share any warning and / or error messages that have appeared.
I''m quite new in bioinformatics world and started working with GEO dataset GSE42023. Our goals is to extract deferentially expressed genes . My question is:
The gse42023 provides me 2 types of files : RAW.tar and non-normalized.txt.gz. I know that : i need normalize this data to proceed with dea analisis, but i am really confused with this data.
Specially in non-normalized.txt , i have the genes (rows) , and the samples and p-value detection(collumns), in this case, how to proceed ?
Thanks! Example of non_normalized.txt
Unfortunately, you have the same problem as many people.
For some reason, for the Illumina microarray studies, GEO requires that authors upload data in this non-standard format. I am not sure that you can use the standard Bioconductor package, lumi, for this. Instead, you may have to process this manually, and I provide an advanced workflow here: A: illumina Arrays Illumina HumanHT-12 V3.0 expression beadchip reading data
Thank's for your response Kevin!!! I am following your workflow. But i have a question: How normalize this data ? Now i have a matrix with ID ref and p-value detection for all samples. Please forgive me for so many questions.
Hey, you need to remove the detection p-value columns. I explain this in the other post (but I do not provide the code):
"You should then extract out the Detection PVal columns and save them for later, and also set the rownames of the object to be equal to ID_REF. The final x object should be just the expression levels, and it should be a data-matrix."
The normalisation is then performed with the
neqc()
function, also mentioned in the other post.Sorry, this is a very non-standard workflow.
So, i normalize this data: ( ID_REF and the counts of genes)
Or This data: (ID_REF and just the p-value detection)?
You need to set
ID_REF
as rownames, and then remove that first column (ID_REF) from the data (in both cases).You later use the detection p-values in the following section in my other post: 'filter out control probes, those with no symbol, and those that failed' (these detection p-values will be contained in the object
detectionpvalues
)The columns of both objects also should be aligned perfectly.