How can I combine different Affymetrix platform?
1
0
Entering edit mode
7.1 years ago
lur_murad • 0

I would like to preprocess the microarry dataset GSE9006 which is Gene expression in PBMCs for children with diabetes. The array platforms are two, Affymetrix Human Genome HG-U133A and Affymetrix Human Genome HG-U133B. I need to combine two platform to increase the number of samples (n) and then analyse them for geting differential expression for each genes is it enough to download the data and normalize the expression matrix for each platform then merge them according to Gene Entrez? Thanks a lot for your cooperation

Affymetrix • 3.7k views
ADD COMMENT
0
Entering edit mode

Have a look at this previous question,

what is difference between HG-U133A and HG-U133B array ? which one to use ?,

and at the Affymetrix documentation for the arrays.

The Affymetrix HG-U133A and HG-U133B arrays were a set, not different platforms, so you might want to check whether samples from each patient were run on both arrays.

See also this page:

https://www.ncbi.nlm.nih.gov/gds?term=GSE9006

ADD REPLY
0
Entering edit mode

@mastal511 I want to use both of them each platform contain (117 sample : 80T1D 24 Normal 12 T2D) I want to use all the T1D vs Normal samples which is (160 T1D vs 48 Normal). each platform has a defferent probes only 4478 common genes between them. and the expression level is deffrent as well.

I am new in the Bioinformatic I need a large number of samples to run my approach. How can i combain these set as you called them?

ADD REPLY
0
Entering edit mode

The 2 arrays form a set. The U133A array contained more well-known genes, and the U133B more probesets based on evidence from ESTs, and each array contained some 22K probesets.

Essentially there are 117 samples, presumably each run on both U133A and U133B, and in total, you have intensity measurements from some 44K or so probesets. So half the information from each sample is on the A array, and the other half is on the B array. This is not the same as trying to combine information from different experiments run on different technologies, like for example, Affymetrix expression arrays and Illumina expression arrays, which are designed in different ways.

Some genes will have several probesets assigned to them, either on the same array, or some on the A array and some on the B array, but the probesets will probably be looking at different parts of the gene (although in general most of the probes on those types of Affymetrix arrays targeted the 3' ends of the genes), or have been designed to target alternative transcripts produced from the same gene.

What you should do is look at which probesets are differentially expressed. The annotations (genes the probesets are assigned to) for the probesets may change from time to time, especially for probesets on the B array that were designed based on information from ESTs.

ADD REPLY
0
Entering edit mode
Gene    U133A       U133B
'ABCC4' 1.316116395     1.099865145
'ADH1B' 1.327648636     -1.008188273
'ANGPTL4'   -2.350192037        -1.025937412
'ANK2'  1.558927203     -1.278292427
'ANXA1' -1.742008276        -1.01118217
'APOB'  -1.990832086        -1.189922907
'ARG1'  -1.318281938        1.238138398
'ARGLU1'    -1.636577742        -1.208285498
'ATP6V1D'   1.491767021     1.163026149
'BCL2L10'   -1.363365385        -1.033854088
'BCL2L14'   1.562237198     -1.02335048
'BHMT2' 1.797793045     -1.136313447
'BNC2'  -1.424195868        1.000070992
'BNC2'  -1.424195868        1.000070992
'CADM1' -1.384649004        1.042315341
'CADPS' 1.46785031      -1.041100165
'CBS'   -1.520996842        -1.066793757
'CD48'  1.410695857     1.096595355
'CD48'  1.410695857     1.096595355
'CHRM3' 1.35536998      1.003250995
'CLEC2D'    -1.302040283        -1.083768888
'CNDP2' -1.373034767        1.00867911
'CNTN5' 1.683794551     -1.270780239
'COL11A1'   1.32174493      1.194591417
'COL3A1'    1.303236594     -1.189626784
'COL4A5'    -1.306929948        -1.07094705
'CYBA'  -1.328410667        -1.128989273
'CYP19A1'   -1.341314108        -1.102602861
'CYP3A4'    1.634722747     1.056378251
'DCC'   1.510835634     -1.603497887
'DGCR14'    -1.342113648        -1.139748612
'DGUOK' 1.362356981     1.047008968
'DLC1'  1.497617851     1.025559915
'DMD'   1.365260185     1.146638213
'DPF3'  1.38986442      1.032422698
'EIF2AK3'   1.514038543     -1.275649019
'EPHA1' 1.428691574     1.064797524
'ERBB3' 1.556130395     -1.118611358
'ERBB4' 1.474139318     -1.223614286
'EVC'   1.744292145     -1.216207301
'F7'    -1.456596317        -1.256995954
'FADS2' 1.479012346     -1.052877335
'FAF1'  -1.314893414        1.019691453
'FGF12' 1.448485833     1.098273878
'FOXO1' -1.464185608        1.299814419
'GCLM'  -1.314824337        -1.052402075
'GJA5'  -1.610810766        -1.174838446
'GJC1'  1.317340693     -1.109040698
'GORASP1'   -1.369677308        1.084044212
'GPR98' -1.396324739        1.396612693
'GRIA4' 2.629273123     -1.162611226
'HDAC2' 1.458966128     -1.208953456
'HIVEP3'    1.339655881     -1.003643941
'HLA-DPB1'  -1.592675725        -1.02829585
'HSPA1L'    -1.322954357        1.14511081
'IGFBP1'    -1.334367668        -1.370205268
'IRF5'  1.392265225     -1.607073609
'ITGA1' 1.376487971     1.253676413
'ITGA2' 1.376399306     1.450818904
'ITGB6' -1.411455964        -1.181282596
'KLF15' 1.313649682     -1.2925
'KNG1'  1.746125086     -1.112399583
'LIPA'  -1.353394451        1.103174951
'LMNA'  -1.322405043        -1.204981404
'LMO7'  1.348071991     1.123337533
'MCM10' -1.330606937        1.673279525
'MGP'   -1.453242397        -1.787837838
'MYH7B' -2.387075147        1.058239767
'NAMPT' -1.334271765        -1.182607211
'NCOR1' 1.845354126     -1.037591636
'NFKBIB'    -1.450484045        -1.804063082
'NOS3'  -1.847852825        1.181066383
'NRXN3' -1.713424793        -1.027124238
'NTNG1' 1.459206108     1.160738621
'NUP133'    -1.367441332        -1.028757889
'OSM'   -1.469012887        -1.936409991
'PCYT1B'    1.419738334     1.102407891
'PDE4A' -1.956840391        -1.088125842
'PLD1'  -1.363600956        1.069971161
'PPP1R9A'   1.566093718     -1.002639916
'PRCP'  1.312576772     1.067534384
'PTGER1'    -1.815607567        -1.129215561
'PTPN2' 2.170804719     1.070092057
'PTPRD' 1.410802605     1.176360691
'PTPRG' -1.538189625        -1.302427304
'RAB6B' 1.400380196     -1.007393603
'RALGPS1'   -1.976895078        1.403315697
'RBMS1' 1.382796022     1.168304383
'RBMS1' 1.382796022     1.168304383
'RFC3'  1.302815625     1.082265432
'RFX2'  1.314235465     -1.5897521
'RNF17' -1.417139519        -1.532896813
'ROR1'  1.547575034     -1.208169794
'SASH1' -1.341710284        -1.084923833
'SCARB1'    -1.407964198        -1.113158737
'SCNN1G'    1.433211374     -1.137954715
'SLC27A5'   -2.103063728        -1.228554995
'SLC6A13'   1.571661605     1.131365114
'SLIT2' 2.205648092     -1.038132911
'SMARCB1'   -1.311952145        -1.012119746
'SOCS3' -2.774464415        -1.994889526
'SORBS1'    -1.562808615        -1.12105425
'SORBS2'    1.321248741     -1.306857616
'ST8SIA4'   -1.314179272        -1.237198388
'TCF7L2'    1.509933423     1.016953742
'THBS1' -1.35321437     1.161802987
'TRAF6' 1.477840872     -1.036330087
'TUB'   -1.301811754        1.093728112
'UPB1'  1.315262645     1.192613904
'WWOX'  -1.313117593        -1.033460492
'ZFAND5'    1.320821218     -1.022301519

I am intersted in those 112 genes. As you see their FC values is differnet in each platform How can I combain U133A+U133B in one expression matrix

ADD REPLY
0
Entering edit mode

A similar question was just posted today: Regarding Microarray Platforms

Please take a look at my answer and see if it helps.

ADD REPLY
0
Entering edit mode

Thank you Dr kevin I will

ADD REPLY
0
Entering edit mode
7.1 years ago

Please take a look at some of the comments here: How to integrate multiple data sets from microarray platform prior meta-analysis?

The ideal situation would be to use just the common genes and then include ArrayVersion (i.e. batch) as a covariate in all downstream statistical analyses. I'm not sure there is any ideal way to use genes that don't overlap - where they don't overlap, the values would just have to be NA in samples were there's no data.

ADD COMMENT
0
Entering edit mode

Thank you Kevin What do you mean by Merge the data I have 117 samples in each of them and 4478 common genes do ou mean simply combain the samples which will be 4478 genesx224 samples

%%%%%%%%%%%%%

Thank you Kevin Yeah I realised that some genes have reverse fold-changes. Do you think using one platform will be better than merge them?

My problem is the limited number of normal only 24 vs 80 T1D

ADD REPLY
0
Entering edit mode

Yes, and then create a new categorical variable that records the array from which each sample was obtained (and include this as a covariate in all downstream analyses).

However, it looks like you have major issues with this data, as some genes have reverse fold-changes.

ADD REPLY

Login before adding your answer.

Traffic: 1749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6