Build the classes to determine which replicate is in control vs. treated condition
1
1
Entering edit mode
9.3 years ago

Hello,

I want to analyze entire Connectivity Map dataset (~120 drugs, ~560 arrays, two chipsets (HG-U133A and HTHG-U133A)). I am reading the series matrix file available on GEO. As I want the differential expression of group of instances where cell line is same, platform is same, drug and its concentration is same so to build classes to determine which replicate is in which particular condition I have to do something like this:

data <- getGEO('GSE5258')
eset <- data[[2]] # Taking GPL96 array into an expression set
show(pData(phenoData(eset))[1:2,])

                   title geo_accession                status submission_date last_update_date type channel_count source_name_ch1 organism_ch1         characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
GSM118720 EC2003090503AA     GSM118720 Public on Sep 27 2006     Jul 06 2006      Sep 18 2006  RNA             1     cmap_well:3 Homo sapiens perturbagen: small molecule       type: treatment       name: metformin
GSM118721 EC2003090502AA     GSM118721 Public on Sep 27 2006     Jul 06 2006      Sep 18 2006  RNA             1     cmap_well:2 Homo sapiens perturbagen: small molecule         type: control            name: null
            characteristics_ch1.3 characteristics_ch1.4 characteristics_ch1.5 characteristics_ch1.6 characteristics_ch1.7 molecule_ch1 label_ch1 taxid_ch1                                    description data_processing platform_id
GSM118720 concentration: .00001 M       vehicle: medium   vehicle_final: null         duration: 6 h            cell: MCF7    total RNA    biotin      9606 MCF7 treated with metformin (.00001 M) for 6 h         MAS 5.0       GPL96
GSM118721     concentration: null       vehicle: medium   vehicle_final: null         duration: 6 h            cell: MCF7    total RNA    biotin      9606             MCF7 with vehicle (medium) for 6 h         MAS 5.0       GPL96

h1=as.numeric(pData(eset)["characteristics_ch1.2"]=="name: metformin") # In a logical operator (h1) put 1 where drug = metformin and 0 otherwise and it works fine.

h1

[1] 1 0 1 1 1 0 0 0 0 1 0 0 ...... [346] 0

Now I want to apply multiple conditions: where drug == "metformin" AND cell line == "MCF7"

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && characteristics_ch1.7=="cell: MCF7")
Error: object 'characteristics_ch1.7' not found

I am unable to apply multiple conditions here. I am even not sure if the approach I am following will work as well. Kindly share your views about the problem. Thank you.

R Bioconductor microarray Connectivity-Map • 2.3k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && pData(eset)$characteristics_ch1.7=="cell: MCF7")

The error message was very informative in this case.

ADD COMMENT
0
Entering edit mode

I already tried this but then the output is:

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && pData(eset)$characteristics_ch1.7=="cell: MCF7")

c1
[1] 1

And it should not be the result. 3 samples are there which are treated with metformin and cell line is MCF7. And beside this it should give 0 for every other sample..!

In following case it returns just 'TRUE'

c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin") && (pData(eset)$characteristics_ch1.7=="cell: MCF7")

c1
[1] TRUE
ADD REPLY
1
Entering edit mode

What you probably really want to do is something like:

idx <- which(pData(eset)$characteristics_ch1.2=="name: metformin" & pData(eset)$characteristics_ch1.7=="cell: MCF7")

That will then give you the indices for subsetting. Note that & and && are somewhat different in R.

ADD REPLY
0
Entering edit mode

Do you think this is the right approach for creating classes for each drug and then using a for loop I will iterate through every one of them.

Another thing is I want to add the condition "Drug concentration is same" which is "characteristics_ch1.3" and values for it are as following:

"concentration: .00001 M"

"concentration: null"

"concentration: .001 M"

How can I apply this criteria that concentration of drug should be same and number of instances should be more than 3 (I mean 3 samples having same drug applied, same concentration of drug used, same cell line and same platform and sample size should be at least 3). Thanks for your help.

ADD REPLY
1
Entering edit mode

There are multiple ways to go about this, with the most convenient entirely depending on the exact details of what you're doing. Personally, I would just paste() things together into a factor and then run split() on the dataframe accordingly. That's often a convenient way of creating large numbers of subsets according to multiple criteria.

ADD REPLY
0
Entering edit mode

As I don't have any R experience so I am encountering difficulty in this. Can you suggest if there is any online tutorial where people are trying to use this approach. How they make contrast matrix and design? I have searched a lot but every time I find just little dataset where you can make contrast matrix and design manually and you don't need to use phenodata.

ADD REPLY
0
Entering edit mode

And problem with idx is that I am only getting those indexes:

> idx
[1]  1  3  4  5 10

Whereas I want a complete vector denoting all of the samples and turning 1 for sample where all conditions meet while 0 for every other sample.

ADD REPLY
1
Entering edit mode

So use as.numeric() instead of which().

ADD REPLY
0
Entering edit mode

How to apply following limitations?

"same drug concentration" and "minimum 3 sample meeting the criteria"

ADD REPLY
0
Entering edit mode

I have found a way for 'minimum 3 samples' but I am unable to figure out how I can put a filter of 'same drug concentration' instead of giving a hard coded value. Can you tell me how I can do that. Thanks and sorry for bothering you again.

ADD REPLY
0
Entering edit mode

Dear Devon,

Can you shed some light on this problem?

ADD REPLY
0
Entering edit mode

Please don't solicit responses to your questions in the comments to other peoples' questions.

ADD REPLY

Login before adding your answer.

Traffic: 1582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6