Extract expression data of candidate genes from normalized microarray list
1
3
Entering edit mode
10.5 years ago
pmedi ▴ 50

Hi all,

I'm a completely R- and Bioconductor beginner, and hope somebody can help me with my basic questions. I have several processed microarray data (normalized) from which I want to extract a list of 300 candidate genes (ILMN_IDs). I need in the output not only the gene names, but also the expression values and statistics info (already present in the original file).

I've tried to make data.frames for each file, and compare them, but I always get error...

I'm sorry, this was already explained in a previous thread, but I could not find any.

Thanks in advance!

Paula

Bioconductor R microarray • 4.3k views
ADD COMMENT
0
Entering edit mode

Wha's the error? Some code can help us to understand.

ADD REPLY
0
Entering edit mode

well, I tried a very basic way:

> all=normalizedData
> subset=candidateGenes
> x=all%in%subset
> all[x] #returns a Dataframe with 0 columns and 4000 rows.... this is not correct, since normalizedData has 24 columns...
ADD REPLY
0
Entering edit mode

do you have a column in "all" with the gene id? if so you can try:
all[which(all$gene_id %in% subset)]

ADD REPLY
0
Entering edit mode

Dear Martombo, thanks for the suggestion, but it still does not work.... I get:

in `[.default`(all, x) : invalid type 'list'

and the Dataframe is still with 0 columns.

ADD REPLY
0
Entering edit mode
what is the type of your objects? can you show us the output of head(all) and head(subset)
ADD REPLY
0
Entering edit mode
10.5 years ago
pmedi ▴ 50

here it is:

> head(all)

          Name meanbgt meanbgc        cvt       cvc      meant    stderrt
1 ILMN_2188862       0 0.00000 0.11798164 0.2374678  4618.4715  314.59520
2 ILMN_1757497       0 0.00000 0.09400562 0.2306049 13226.2172  717.84198
3 ILMN_1718977       0 0.00000 0.19646015 0.1977541  5560.2394  630.67748
4 ILMN_1677402       0 0.00000 0.12334626 0.1734464 17487.3497 1245.34402

      meanc    stderrc    ratio  ratiose logratio          tp         t2p
1  113.76855  15.597908 40.59533 6.214782 5.343242 0.000138868 0.004758497
2  559.81835  74.534099 23.62591 3.396868 4.562298 0.000061900 0.002937065
3  303.45555  34.646540 18.32308 2.948882 4.195590 0.001138760 0.013880536
4 1093.69965 109.522366 15.98917 1.964738 3.999023 0.000195275 0.005431832

 wilcoxonp         tq       t2q wilcoxonq   limmap  limmapa    SYMBOL
1 0.0808556 0.02560170 0.1645141  0.345836 4.03e-10 4.34e-06     GDF15
2 0.0808556 0.02429853 0.1498372  0.345836 9.14e-10 4.57e-06       VGF
3 0.0808556 0.04910382 0.2084539  0.345836 5.61e-09 1.04e-05   GADD45B
4 0.0808556 0.02802042 0.1682075  0.345836 1.37e-09 4.61e-06 LOC387763

> head(subset)

            Name
1 ILMN_1757497
2 ILMN_2188862
3 ILMN_1677402
4 ILMN_1751607
ADD COMMENT
1
Entering edit mode

ok then try all[which(all$ Name %in% subset$ Name),]

edit: yes sorry, as simon.pearce pointed out I was missing a comma in the command. it should work now.

ADD REPLY
0
Entering edit mode

It worked! I got the subset that I wanted Thanks!!!

ADD REPLY
1
Entering edit mode

If your Name column is unique then you should set it as the rownames when you read the data in, something like:

alldata<-read.table(filename, row.names=1, strings=FALSE)

which then allows you to subset the data on those names, with

alldata[subset,]

the comma is really important there (and is missing from your previous commands), as it says that you want those particular rows and all the columns.

ADD REPLY
0
Entering edit mode

@Martombo, I got again the same result: dataframe with 0 columns and 4312 rows...

@simon.pearce: I understand the idea, but it still shows that there is a error in rows [i]: invalid type 'list'. Here I put the structure of both data:

> class(all)
[1] "data.frame"
> dim(all)
[1] 4312 24
> str(all)
'data.frame': 4312 obs. of 24 variables

info about subset:

> class(subset)
[1] "data.frame"
> dim(subset)
[1] 328 1
> str(subset)
'data.frame': 328 obs. of 1 variable:
 $ V1: Factor w/ 328 levels "ILMN_1651429",..: 177 286 47 169 123 109 268 284 234 186 ...

Thanks!

ADD REPLY
1
Entering edit mode

sigh Apparently reached a limit of 5 posts with my actual account, so the longer message I just typed out disappeared.

Basically R thinks your subset is a data.frame, and I don't think you want it to be. I think you want a character vector.

I have a function I wrote ages ago to read in a list of genes (one per line) from a text file:

read.genelist<-function(string){
  c(t(as.matrix(read.table(paste(string,".txt",sep="")))))}

then use subset<-read.genelist(filename) to read filename.txt, and then use that to do your subsetting, all[subset,]

If that list contains some genes that aren't in your table, then you may need to do:

all[intersect(subset,rownames(all)),]
ADD REPLY

Login before adding your answer.

Traffic: 1238 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6