Hello,
I want to analyze entire Connectivity Map dataset (~120 drugs, ~560 arrays, two chipsets (HG-U133A and HTHG-U133A)). I am reading the series matrix file available on GEO. As I want the differential expression of group of instances where cell line is same, platform is same, drug and its concentration is same so to build classes to determine which replicate is in which particular condition I have to do something like this:
data <- getGEO('GSE5258')
eset <- data[[2]] # Taking GPL96 array into an expression set
show(pData(phenoData(eset))[1:2,])
title geo_accession status submission_date last_update_date type channel_count source_name_ch1 organism_ch1 characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
GSM118720 EC2003090503AA GSM118720 Public on Sep 27 2006 Jul 06 2006 Sep 18 2006 RNA 1 cmap_well:3 Homo sapiens perturbagen: small molecule type: treatment name: metformin
GSM118721 EC2003090502AA GSM118721 Public on Sep 27 2006 Jul 06 2006 Sep 18 2006 RNA 1 cmap_well:2 Homo sapiens perturbagen: small molecule type: control name: null
characteristics_ch1.3 characteristics_ch1.4 characteristics_ch1.5 characteristics_ch1.6 characteristics_ch1.7 molecule_ch1 label_ch1 taxid_ch1 description data_processing platform_id
GSM118720 concentration: .00001 M vehicle: medium vehicle_final: null duration: 6 h cell: MCF7 total RNA biotin 9606 MCF7 treated with metformin (.00001 M) for 6 h MAS 5.0 GPL96
GSM118721 concentration: null vehicle: medium vehicle_final: null duration: 6 h cell: MCF7 total RNA biotin 9606 MCF7 with vehicle (medium) for 6 h MAS 5.0 GPL96
h1=as.numeric(pData(eset)["characteristics_ch1.2"]=="name: metformin") # In a logical operator (h1) put 1 where drug = metformin and 0 otherwise and it works fine.
h1
[1] 1 0 1 1 1 0 0 0 0 1 0 0 ...... [346] 0
Now I want to apply multiple conditions: where drug == "metformin"
AND cell line == "MCF7"
c1=as.numeric(pData(eset)$characteristics_ch1.2=="name: metformin" && characteristics_ch1.7=="cell: MCF7")
Error: object 'characteristics_ch1.7' not found
I am unable to apply multiple conditions here. I am even not sure if the approach I am following will work as well. Kindly share your views about the problem. Thank you.
I already tried this but then the output is:
And it should not be the result. 3 samples are there which are treated with metformin and cell line is MCF7. And beside this it should give 0 for every other sample..!
In following case it returns just 'TRUE'
What you probably really want to do is something like:
That will then give you the indices for subsetting. Note that & and && are somewhat different in R.
Do you think this is the right approach for creating classes for each drug and then using a for loop I will iterate through every one of them.
Another thing is I want to add the condition "Drug concentration is same" which is "characteristics_ch1.3" and values for it are as following:
"concentration: .00001 M"
"concentration: null"
"concentration: .001 M"
How can I apply this criteria that concentration of drug should be same and number of instances should be more than 3 (I mean 3 samples having same drug applied, same concentration of drug used, same cell line and same platform and sample size should be at least 3). Thanks for your help.
There are multiple ways to go about this, with the most convenient entirely depending on the exact details of what you're doing. Personally, I would just paste() things together into a factor and then run split() on the dataframe accordingly. That's often a convenient way of creating large numbers of subsets according to multiple criteria.
As I don't have any R experience so I am encountering difficulty in this. Can you suggest if there is any online tutorial where people are trying to use this approach. How they make contrast matrix and design? I have searched a lot but every time I find just little dataset where you can make contrast matrix and design manually and you don't need to use phenodata.
And problem with
idx
is that I am only getting those indexes:Whereas I want a complete vector denoting all of the samples and turning 1 for sample where all conditions meet while 0 for every other sample.
So use
as.numeric()
instead ofwhich()
.How to apply following limitations?
"same drug concentration" and "minimum 3 sample meeting the criteria"
I have found a way for 'minimum 3 samples' but I am unable to figure out how I can put a filter of 'same drug concentration' instead of giving a hard coded value. Can you tell me how I can do that. Thanks and sorry for bothering you again.
Dear Devon,
Can you shed some light on this problem?
Please don't solicit responses to your questions in the comments to other peoples' questions.