Hello,
I have four number of Two-color array chips (.gpr files/ genepix) with different time points, infecting organism and with dye swaps and some having biological replicate. it's getting complicated for me to make design/contrast matrix to find differential gene expression.
I have done normalization step with all these chips and also merge them using InSilicoDb package using Combat method. Could anyone help me to make design matrix for these chips in combined form?
Sample description with Cy3/Cy5 information given below.
WT = Wounded and Treated;
T = Treated;
CDLV = Carborundum-dusted leaves treated with Virus;
CDLW = Carborundum-dusted leavestreated with Water
Sample_title Organism Geo protocol_1 Label_1 protocol_2 label_ch2
ctrl_vs_race0_Scjm_72h_1 Fungus GSM200000 Treated Cy3 Control Cy5
ctrl_vs_race0_Scjm_72h_2 Fungus GSM200001 Treated Cy3 Control Cy5
ctrl_vs_race0_Scjm_72h_3 Fungus GSM200002 Treated Cy3 Control Cy5
ctrl_vs_race0_Scjm_96h_1 Fungus GSM200003 Treated Cy3 Control Cy5
ctrl_vs_race0_Scjm_96h_2 Fungus GSM200004 Treated Cy3 Control Cy5
ctrl_vs_race0_Stbr_72h_1 Fungus GSM200005 Treated Cy3 Control Cy5
ctrl_vs_race0_Stbr_72h_2 Fungus GSM200006 Treated Cy3 Control Cy5
ctrl_vs_race0_Stbr_72h_3 Fungus GSM200007 Treated Cy3 Control Cy5
ctrl_vs_race0_Stbr_96h_1 Fungus GSM200008 Treated Cy3 Control Cy5
ctrl_vs_race0_Stbr_96h_2 Fungus GSM200009 Treated Cy3 Control Cy5
ctrl_vs_race0_Stbr_96h_3 Fungus GSM200010 Treated Cy3 Control Cy5
ctrl_vs_complex_Scjm_72h_1 Fungus GSM200011 Treated Cy3 Control Cy5
ctrl_vs_complex_Scjm_72h_2 Fungus GSM200012 Treated Cy3 Control Cy5
ctrl_vs_complex_Scjm_72h_3 Fungus GSM200013 Treated Cy3 Control Cy5
ctrl_vs_complex_Scjm_96h_1 Fungus GSM200014 Treated Cy3 Control Cy5
ctrl_vs_complex_Scjm_96h_2 Fungus GSM200015 Treated Cy3 Control Cy5
ctrl_vs_complex_Scjm_96h_3 Fungus GSM200016 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_72h_1 Fungus GSM200017 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_72h_2 Fungus GSM200018 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_72h_3 Fungus GSM200019 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_96h_1 Fungus GSM200020 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_96h_2 Fungus GSM200021 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_96h_3 Fungus GSM200022 Treated Cy3 Control Cy5
ctrl_vs_complex_Stbr_72h_1FD Fungus GSM200023 Control Cy3 Treated Cy5
Sample Organism Geo protocol_1 Label_1 protocol_2 label_2
ctrl_vs_1h_infested1 Insect GSM200024 T 1hr Cy3 Control 4 hr Cy5
ctrl_vs_1h_infested2 Insect GSM200025 T 1hr Cy3 Control 4 hr Cy5
ctrl_vs_1h_infested3 Insect GSM200026 T 1 hr Cy3 Control 4 hr Cy5
ctrl_vs_1h_infested1swap Insect GSM200027 T 1hr Cy5 Control 4 hr Cy3
ctrl_vs_1h_infested2swap Insect GSM200028 T 1hr Cy5 Control 4 hr Cy3
ctrl_vs_1h_infested3swap Insect GSM200029 T 1 hr Cy5 Control 4 hr Cy3
wounded_vs_wounded_spit1 Insect GSM200030 WT 4 hr Cy3 Wounded 4 hr Cy5
wounded_vs_wounded_spit2 Insect GSM200031 WT 4 hr Cy3 Wounded 4 hr Cy5
wounded_vs_wounded_spit3 Insect GSM200032 WT 4 hr Cy3 Wounded 4 hr Cy5
wounded_vs_wounded_spit1swap Insect GSM200033 WT 4 hr Cy5 Wounded 4 hr Cy3
wounded_vs_wounded_spit2swap Insect GSM200034 WT 4 hr Cy5 Wounded 4 hr Cy3
wounded_vs_wounded_spit3swap Insect GSM200035 WT 4 hr Cy5 Wounded 4 hr Cy3
ctrl_vs_4h_infested1 Insect GSM200036 T 4 hr Cy3 Control 4 hr Cy5
ctrl_vs_4h_infested2 Insect GSM200037 T 4 hr Cy3 Control 4 hr Cy5
ctrl_vs_4h_infested3 Insect GSM200038 T 4 hr Cy3 Control 4 hr Cy5
ctrl_vs_4h_infested1swap Insect GSM200039 T 4 hr Cy5 Control 4 hr Cy3
ctrl_vs_4h_infested2swap Insect GSM200040 T 4 hr Cy5 Control 4 hr Cy3
ctrl_vs_4h_infested3swap Insect GSM200041 T 4 hr Cy5 Control 4 hr Cy3
wd_systemic_vs_spit_systemic1 Insect GSM200042 WT 4 hr Cy3 Wounded 4 hr Cy5
wd_systemic_vs_spit_systemic2 Insect GSM200043 WT 4 hr Cy3 Wounded 4 hr Cy5
wd_systemic_vs_spit_systemic3 Insect GSM200044 WT 4 hr Cy3 Wounded 4 hr Cy5
wd_systemic_vs_spit_systemic1s Insect GSM200045 WT 4 hr Cy5 Wounded 4 hr Cy3
wd_systemic_vs_spit_systemic2s Insect GSM200046 WT 4 hr Cy5 Wounded 4 hr Cy3
wd_systemic_vs_spit_systemic3s Insect GSM200047 WT 4 hr Cy5 Wounded 4 hr Cy3
Sample_title Organism Geo protocol_1 Label_1 protocol_2 label_2
non_GM_D_vs_GM_A_Pinf_1 Fungus GSM200048 Treated_24h Cy3 Treated_24h Cy5
non_GM_D_vs_GM_A_water_1 Fungus GSM200049 Control_24h Cy3 Control_24h Cy5
non_GM_E_vs_GM_B_Pinf_1 Fungus GSM200050 Treated_24h Cy3 Treated_24h Cy5
non_GM_E_vs_GM_B_water_1 Fungus GSM200051 Control_24h Cy3 Control_24h Cy5
non_GM_F_vs_GM_C_Pinf_1 Fungus GSM200052 Treated_24h Cy3 Treated_24h Cy5
non_GM_F_vs_GM_C_water_1 Fungus GSM200053 Control_24h Cy3 Control_24h Cy5
non_GM_D_vs_GM_A_Pinf_2 Fungus GSM200054 Treated_24h Cy3 Treated_24h Cy5
non_GM_D_vs_GM_A_water_2 Fungus GSM200055 Control_24h Cy3 Control_24h Cy5
non_GM_E_vs_GM_B_Pinf_2 Fungus GSM200056 Treated_24h Cy3 Treated_24h Cy5
non_GM_E_vs_GM_B_water_2 Fungus GSM200057 Control_24h Cy3 Control_24h Cy5
non_GM_F_vs_GM_C_Pinf_2 Fungus GSM200058 Treated_24h Cy3 Treated_24h Cy5
non_GM_F_vs_GM_C_water_2 Fungus GSM200059 Control_24h Cy3 Control_24h Cy5
non_GM_D_vs_GM_A_Pinf_3 Fungus GSM200060 Treated_24h Cy3 Treated_24h Cy5
non_GM_D_vs_GM_A_water_3 Fungus GSM200061 Control_24h Cy3 Control_24h Cy5
non_GM_E_vs_GM_B_Pinf_3 Fungus GSM200062 Treated_24h Cy3 Treated_24h Cy5
non_GM_E_vs_GM_B_water_3 Fungus GSM200063 Control_24h Cy3 Control_24h Cy5
non_GM_F_vs_GM_C_Pinf_3 Fungus GSM200064 Treated_24h Cy3 Treated_24h Cy5
non_GM_F_vs_GM_C_water_3 Fungus GSM200065 Control_24h Cy3 Control_24h Cy5
non_GM_D_vs_GM_A_water_3FD Fungus GSM200066 Control_24h Cy5 Control_24h Cy3
!Sample_title Organism Geo treatment_1 Label_1 treatment_2 Label_2
mock_vs_pvy_infected_1dpi_1.1 Virus GSM200067 CDLV_1dpi Cy3 CDLW_1dpi Cy5
mock_vs_pvy_infected_1dpi_1.2 Virus GSM200068 CDLV_1dpi Cy3 CDLW_1dpi Cy5
mock_vs_pvy_infected_1dpi_2.1 Virus GSM200069 CDLV_1dpi Cy3 CDLW_1dpi Cy5
mock_vs_pvy_infected_1dpi_2.2 Virus GSM200070 CDLV_1dpi Cy3 CDLW_1dpi Cy5
mock_vs_pvy_infected_3dpi_1.1 Virus GSM200071 CDLV_3dpi Cy3 CDLW_3dpi Cy5
mock_vs_pvy_infected_3dpi_1.2 Virus GSM200072 CDLV_3dpi Cy3 CDLW_3dpi Cy5
mock_vs_pvy_infected_3dpi_1_FD Virus GSM200073 CDLV_3dpi Cy5 CDLW_3dpi Cy3
mock_vs_pvy_infected_3dpi_2.1 Virus GSM200074 CDLV_3dpi Cy3 CDLW_3dpi Cy5
mock_vs_pvy_infected_3dpi_2.2 Virus GSM200075 CDLV_3dpi Cy3 CDLW_3dpi Cy5
mock_vs_pvy_infected_6dpi_1.1 Virus GSM200076 CDLV_6dpi Cy3 CDLW_6dpi Cy5
mock_vs_pvy_infected_6dpi_1.2 Virus GSM200077 CDLV_6dpi Cy3 CDLW_6dpi Cy5
mock_vs_pvy_infected_6dpi_2.1 Virus GSM200078 CDLV_6dpi Cy3 CDLW_6dpi Cy5
mock_vs_pvy_infected_6dpi_2.2 Virus GSM200079 CDLV_6dpi Cy3 CDLW_6dpi Cy5
mock_vs_pvy_systemic_1dpi_1.1 Virus GSM200080 Treated_1dpi Cy3 Treated_1dpi Cy5
mock_vs_pvy_systemic_1dpi_1.2 Virus GSM200081 Treated_1dpi Cy3 Treated_1dpi Cy5
mock_vs_pvy_systemic_1dpi_2.1 Virus GSM200082 Treated_1dpi Cy3 Treated_1dpi Cy5
mock_vs_pvy_systemic_1dpi_2.2 Virus GSM200083 Treated_1dpi Cy3 Treated_1dpi Cy5
mock_vs_pvy_systemic_3dpi_1.1 Virus GSM200084 Treated_3dpi Cy3 Treated_3dpi Cy5
mock_vs_pvy_systemic_3dpi_1.2 Virus GSM200085 Treated_3dpi Cy3 Treated_3dpi Cy5
mock_vs_pvy_systemic_3dpi_1_FD Virus GSM200086 Treated_3dpi Cy5 Treated_3dpi Cy3
mock_vs_pvy_systemic_3dpi_2.1 Virus GSM200087 Treated_3dpi Cy3 Treated_3dpi Cy5
mock_vs_pvy_systemic_3dpi_2.2 Virus GSM200088 Treated_3dpi Cy3 Treated_3dpi Cy5
mock_vs_pvy_systemic_6dpi_1.1 Virus GSM200089 Treated_6dpi Cy3 Treated_6dpi Cy5
mock_vs_pvy_systemic_6dpi_1.2 Virus GSM200090 Treated_6dpi Cy3 Treated_6dpi Cy5
mock_vs_pvy_systemic_6dpi_2.1 Virus GSM200091 Treated_6dpi Cy3 Treated_6dpi Cy5
mock_vs_pvy_systemic_6dpi_2.2 Virus GSM200092 Treated_6dpi Cy3 Treated_6dpi Cy5
Some things do not make sense, for now at least. Can you clarify the following:
-----------------------------------------
Can you list the contrasts that you want to see? For example:
Finally, can you take a look at my tutorial for 2-colour arrays and see if you can get any help there. Note, that, for the function
read.maimages()
, you would selectsource="genepix"
HI kevin,
1.YES, I have used combat to remove the batch effect and want to include batch effect in design/contrast matrix.
2.Actually, I want to find DEG in response to Biotic stress on plants, so yes I want to compare across these organisms with respect to given different time points.
Okay, but I and many others would never recommend Combat. By using it, you run the risk of modifying your data unexpectedly, possibly introducing even more bias into your results than there would have been had you not used Combat.
Your experimental set-up does look complex. Please note, however, that, if you have processed your data correctly, then you should have produced a MAlist for each sample. The log base 2 expression values for each sample can then be accessed via the M object. For example,
MyObject$M
In your limma design model, I would include at least
or
It depends on at which comparisons you want to look and how you view 'Organism' - i.e. confounder or covariate of interest in terms of differential expression.
Please take some time to read these Bioconductor questions and answers (one from a developer of Limma):
Question: Removing continuous covariate effects in limma analysis
Question: remove batch effect and adjusted model for 4 covariates in Limma
There is no correct answer here. You will be looking at this data for many weeks until you get it right.
Also, apologies, here is the tutorial that I posted for 2-colour arrays: A: build the expression matrix step by step from GEO raw data
Thankyou kevin,
I have use combat method and yes it modified my data very much, if u can provide me with some better method than combat then I will try it.
Also, I have MAlist of each sample. You have included tratment_1 only [what about different time-points]. if u don't mind, would you please to explain your codes to me? Also, I am worried about dye-swapping in few samples.
Instead of using Combat, create a new column in your metadata for 'Batch', as follows (look to the right):
Note that, when you refer to 'Control' for Cy5, it is not an extra control sample - it is just a 'template' DNA used on this microaray version for the purposes of calculating the log base 2 expression values of your sample in question. It is akin to the hg19 or hg38 reference genome, but it is not a true control in the sense of Case Vs. Control.
Thus, if we just focus on the Fungus samples, your design model would be:
Then, you could compare, for example,
Treated_24h-Control_24h
(please follow my tutorial step-by-step).Just ignore dye-swapping and other organisms, for now. Keep it very simple until you feel comfortable doing basic comparisons whilst adjusting for batch. Also, again, please read the answer here: Question: Removing continuous covariate effects in limma analysis
Thank you for your suggestions, I will try this
Actually, I have applied design model to different chips according to their experimental design, one example I have given below. By looking at my code u will know that I am comfortable doing basic comparisons. But my main problem is to combine all chips and then do a common analysis to find DEG. Also, Its very complicated as I have to consider many variables at a time [timepoints, organism, batch effect, dye-swap].
I have gone through the links provided by you. They are very informative, thankyou.
I have writte a code for single chip, details are below:
My Target file:
My code:
I am obtaining very good results with this code, up to 10 Fold Change DEG, and most of them are known to be expressed in stress.
But to complete one of my project objectives, 24 sample is not enough, I want to increase my sample size so that i can justify my objective.
Yes, you are clearly adept (good) at coding. However, i'm sorry, I don't see how you can realistically merge these experiments together and extract any more useful value from it. I think that you should leave the results as independent investigations and then meta-analyse all results 'manually'. There are too many confounding factors that require adjustment and the results, from my perspective, would likely end up meaningless.
Best of luck
Kevin
Hello 1234anjalianjali1234!
It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/107938/
This is typically not recommended as it runs the risk of annoying people in both communities.
Sorry for the inconvenience, how do I delete the post from Bioconductor?
Now that you already posted there, I suggest you do not delete, but add a comment pointing to this thread, and apologize for the small faux pas of cross-posting.