Question

Microarray data analysis - pipeline (problem with custom CDF)

2

Entering edit mode

7.3 years ago

lessismore ★ 1.4k

Hello everybody,

i am analyzing for microarray data in Arabidopsis. I took advantage from the code here supplied (sharing some naive codes for microarray normalization in R with whom are too new in R alike me) but im ending up with a different number of analysed genes in my final expression file.

Then i got that the problem is that it has been used a custom CDF file. Could someone tell me how could i include the custom CDF file (supplied in ArrayExpress) in my analysis? here is the dataset i am looking at: https://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-5632/files/

here is the code i used:

library(affy)

# To read all CEL files in the working directory:
Data <- ReadAffy()

eset <- rma(Data)
norm.data <- exprs(eset)

# The norm.data R object contains the normalized expression for every probeset in the ATH1 microarrays used in this example. In order to convert the probeset IDs to Arabidopsis gene identifiers, the file ftp://ftp.arabidopsis.org/home/tair/Microarrays/Affymetrix/affy_ATH1_array_elements-2010-12-20.txt download from the TAIR database and place in the folder with the microarray data. In order to avoid ambiguous probeset associations (i.e. probesets that have multiple matches to genes), we only used probes that match only one gene in the Arabidopsis genome.
affy_names <- read.delim("affy_ATH1_array_elements-2010-12-20.txt",header=T)

# Select the columns that contain the probeset ID and corresponding AGI number. Please note that the positions used to index the matrix depend on the input format of the array elements file. You can change these numbers to index the corresponding columns if you are using a different format:
probe_agi <- as.matrix(affy_names[,c(1,5)])

# To associate the probeset with the corresponding AGI locus:
normalized.names <-merge(probe_agi,norm.data,by.x=1,by.y=0)[,-1]

# To remove probesets that do not match the Arabidopsis genome:
normalized.arabidopsis <- normalized.names[grep("AT",normalized.names[,1]),]

# To remove ambiguous probes:
normalized.arabidopsis.unambiguous <- normalized.arabidopsis[grep(pattern="",normalized.arabidopsis[,1], invert=T),]

# In some cases, multiple probes match the same gene, due to updates in the annotation of the genome. To remove duplicated genes in the matrix:
normalized.agi.final <- normalized.arabidopsis.unambiguous[!duplicated(normalized.arabidopsis.unambiguous[,1]),]

# To assign the AGI number as row name:
rownames(normalized.agi.final) <- normalized.agi.final[,1]
normalized.agi.final <- normalized.agi.final[,-1]

#The resulting gene expression dataset contains unique row identifies (i.e. AGI locus), and different expression values obtained from different experiments on each column

# To export this data matrix from R to a tab-delimited file use the following command. The file will be written to the folder that you set up as your working directory in R using the setwd() command in line 1 above:
write.table (normalized.agi.final,"RMA.txt", sep="\t",col.names=NA,quote=F)

microarray RMA affymetrix cdf Arrayexpress • 3.3k views

ADD COMMENT • link updated 7.3 years ago by mannoulag1 ▴ 130 • written 7.3 years ago by lessismore ★ 1.4k

0

Entering edit mode

Hi, I used your code to analyse my dataset then I want to see the final matrix but I have an error

View(normalized.agi.final)
Error in View(normalized.agi.final) : argument 'x' incorrect

ADD REPLY • link 7.3 years ago by mannoulag1 ▴ 130

0

Entering edit mode

try to export and reimport it.

ADD REPLY • link 7.3 years ago by lessismore ★ 1.4k

0

Entering edit mode

yes I do it but I have 0 rows. to show the data, I do:

> normalized.agi.final

    [1] locus            GSM131576.CEL.gz GSM131577.CEL.gz GSM131578.CEL.gz GSM131579.CEL.gz GSM131580.CEL.gz
     [7] GSM131581.CEL.gz GSM131582.CEL.gz GSM131583.CEL.gz GSM131584.CEL.gz GSM131585.CEL.gz GSM131586.CEL.gz
    [13] GSM131587.CEL.gz GSM131588.CEL.gz GSM131589.CEL.gz GSM131590.CEL.gz GSM131591.CEL.gz GSM131592.CEL.gz
    [19] GSM131593.CEL.gz GSM131594.CEL.gz GSM131595.CEL.gz GSM131596.CEL.gz GSM131597.CEL.gz GSM131598.CEL.gz
    [25] GSM131599.CEL.gz GSM131600.CEL.gz GSM131601.CEL.gz GSM131602.CEL.gz GSM131603.CEL.gz GSM131604.CEL.gz
    [31] GSM131605.CEL.gz GSM131606.CEL.gz GSM131607.CEL.gz GSM131608.CEL.gz GSM131609.CEL.gz GSM131610.CEL.gz
    [37] GSM131611.CEL.gz GSM131612.CEL.gz GSM131613.CEL.gz GSM131614.CEL.gz GSM131615.CEL.gz GSM131616.CEL.gz
    [43] GSM131617.CEL.gz GSM131618.CEL.gz GSM131619.CEL.gz GSM131620.CEL.gz GSM131621.CEL.gz GSM131622.CEL.gz
    [49] GSM131623.CEL.gz GSM131624.CEL.gz GSM131625.CEL.gz GSM131626.CEL.gz GSM131627.CEL.gz GSM131628.CEL.gz
    [55] GSM131629.CEL.gz GSM131630.CEL.gz GSM131631.CEL.gz GSM131632.CEL.gz GSM131633.CEL.gz GSM131634.CEL.gz
    [61] GSM131635.CEL.gz GSM131636.CEL.gz GSM131637.CEL.gz GSM131638.CEL.gz GSM131639.CEL.gz GSM131640.CEL.gz
    [67] GSM131641.CEL.gz
    <0 lignes> (ou 'row.names' de longueur nulle)

ADD REPLY • link 7.3 years ago by mannoulag1 ▴ 130

0

Entering edit mode

That one was the code i initially used because i had not a custom cdf, then i found it. So my pipeline was:

# load the cdf package in your R environment 
# make sure you set your working directory where you have all CEL files

library (Affy)

# In my case i have the custom CDF so..
Data <- ReadAffy(cdfname = "cdfname")

# Normalization
eset <- rma(Data)
norm.data <- exprs(eset)

# Export Data
write.table (norm.data, "norm.data.txt", sep="\t",col.names=NA,quote=F)

ADD REPLY • link 7.3 years ago by lessismore ★ 1.4k

score 2 · Answer 1 · 2017-08-25

2

Entering edit mode

7.3 years ago

lessismore ★ 1.4k

Solved myself.

Data <- ReadAffy(cdfname = "yourcdffilename")

Important: your cdf object needs to be present in your R environment. Normally there is a package to install to do it.

ADD COMMENT • link 7.3 years ago by lessismore ★ 1.4k