Question

Where to find targets.txt in GEO dataset.

0

Entering edit mode

4.5 years ago

roybatty269 • 0

Hello everyone.

I´m doing my first MDA using the GEO dataset 5583 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5583)

Excuse my ignorance. What I´ve found it´s that most of the tutorials performing MDA usually use a file called targets.txt to create an AnnotatedDataFrame. This file looks something like this. Sample Ids SampleIDs Group Apocrine.grade AR.repeat.length

However, I couldn´t find where to download this targets.txt. My questions is:

is there somewhere to download it, and how? or do I have to create it?

So in order to give a hint of where am I right now. First of all I downloaded the expressionSet using getGEO(), but I found hard to perform the analysis with this data alone whatsoever, as some functions are not designed for expressionSet data type, like rma, etc (I suppose they´re already preprocessed anyway). Then I downloaded CEL files and I created an AffyBatch datatype and called it rawData. So I guess this rawdata is like the assay dataset from expressionSet and I´m lacking the phenotype dataset, which is essentially achieved using the targets.txt. Could I use the phenotypeDataset to create my own targets.txt; or could I use this phenotypeDataset directly?

I´ll keep trying to figure it put, but any insight would be really appreciated it. Thx!!

R • 1.7k views

ADD COMMENT • link 4.5 years ago by roybatty269 • 0

0

Entering edit mode

Thx kevin!

However, I did use affy:rma to normalize. Oligo wasn´t working.

Was there a special reason for using oligo package?

ADD REPLY • link 4.5 years ago by roybatty269 • 0

0

Entering edit mode

Strange. What was the error from oligo?

Affymetrix eventually modified the chip designs and introduced 'ST' arrays, which had a fundamentally different architecture. This meant that the original affy package could no longer work. Benilton Carvalho then moved onto developing oligo, which works for all Affymetrix arrays.

If affy worked for you, then no problem.

ADD REPLY • link 4.5 years ago by Kevin Blighe 89k

score 1 · Answer 1 · 2020-11-24

Hi, you need to create the targets file yourself, or you can just create it as a data-frame within the R coding environment itself.

The metadata associated with each GEO record will usually have all information that you need. However, to give you an idea, your targets file for an Affymetrix study would look like:

FileName                                    SampleID Group
SampleFiles/1_CS0911a_(HuGene-2_0-st).CEL   CS0911a   KN92
SampleFiles/10_CS0812d_(HuGene-2_0-st).CEL  CS0812d   KN92_WNT3A
SampleFiles/11_CS0812e_(HuGene-2_0-st).CEL  CS0812e   KN93_WNT3A
SampleFiles/12_CS0812f_(HuGene-2_0-st).CEL  CS0812f   KN93_WNT3A
SampleFiles/13_CS0801a_(HuGene-2_0-st).CEL  CS0801a   KN92
SampleFiles/14_CS0801b_(HuGene-2_0-st).CEL  CS0801b   KN92_WNT3A
SampleFiles/15_CS0801c_(HuGene-2_0-st).CEL  CS0801c   KN93_WNT3A
SampleFiles/16_CS1003a_(HuGene-2_0-st).CEL  CS1003a   KN92
SampleFiles/17_CS1003b_(HuGene-2_0-st).CEL  CS1003b   KN92
SampleFiles/18_CS1003c_(HuGene-2_0-st).CEL  CS1003c   KN92_WNT3A
SampleFiles/19_CS1003d_(HuGene-2_0-st).CEL  CS1003d   KN93_WNT3A
SampleFiles/2_CS0911b_(HuGene-2_0-st).CEL   CS0911b   KN92
SampleFiles/20_CS1003e_(HuGene-2_0-st).CEL  CS1003e   KN93_WNT3A
SampleFiles/3_CS0911c_(HuGene-2_0-st).CEL   CS0911c   KN92_WNT3A
SampleFiles/4_CS0911d_(HuGene-2_0-st).CEL   CS0911d   KN92_WNT3A
SampleFiles/5_CS0911e_(HuGene-2_0-st).CEL   CS0911e   KN93_WNT3A
SampleFiles/6_CS0911f_(HuGene-2_0-st).CEL   CS0911f   KN93_WNT3A
SampleFiles/7_CS0812a_(HuGene-2_0-st).CEL   CS0812a   KN92
SampleFiles/8_CS0812b_(HuGene-2_0-st).CEL   CS0812b   KN92
SampleFiles/9_CS0812c_(HuGene-2_0-st).CEL   CS0812c   KN92_WNT3A

I have not anonymised this data because these samples belong to a study of mine that is just accepted for publication (and that already has a GSE ID). I did not put the parentheses in the filenames.

You should be using the oligo package functions, by the way, something along the lines of:

library('limma')
library('oligo')
targetinfo <- readTargets('Targets.txt', sep = '\t')
CELFiles <- list.celfiles('SampleFiles/', full.names = TRUE)
project <- read.celfiles(CELFiles)

# Background correct, normalize, and calculate gene expression
project.bgcorrect.norm.avg <- rma(project, background = TRUE, normalize = TRUE, target = 'core')

Nota Bene! - after you read in the data, please verify that the columns of project.bgcorrect.norm.avg perfectly align with whatever other metadata you are using.

Kevin