Hi,
We have a set of text files from the agilent feature extraction software of a CGH array experiment. In the experiment we used Agilent 4x44 CGH arrays followed by a two-colored hybridization protocol. These are tab-delimited files. They contain some input parameters as well as statistical calculations in the top two parts (separated by '*'.
The third part is the interesting one for me as it contains the results of my analysis. The problem is I am not sure what column I need for downstream analysis. We would like to identify region of loss and gain of chromosomal amount in a comparison between control and treated probes.
I appended the three top rows from the result part of the file at the bottom of this post, as they are long and will distract here.
I would like to work on it on R. Is there a package to read this kind of files into R. I would also like to know about different methods of analyzing such data (normalization, differential expression, annotation etc.)
I would be happy for ant suggestions or ideas on how to analyze such data
Thanks A.
TYPE integer integer integer text text text text text text text text text integer text integer text integer integer text text text text float float float float float float float boolean boolean float float float float integer integer integer integer integer integer float float float float float float float float integer integer float float float float float float float float integer integer boolean boolean float float boolean boolean boolean boolean boolean boolean boolean boolean boolean float float float float float boolean boolean float float integer integer boolean boolean float float float float boolean float float float float float integer float boolean boolean float float float float float float float float float float float float boolean float float boolean boolean boolean boolean
FEATURES FeatureNum Row Col accessions probe_mappings tm NumPMHits IsHomFiltered GCPercent PerformanceScore CpGCnt100 CpGCnt200 SubTypeMask SubTypeName Start Sequence ProbeUID ControlType ProbeName GeneName SystematicName Description PositionX PositionY LogRatio LogRatioError PValueLogRatio gSurrogateUsed rSurrogateUsed gIsFound rIsFound gProcessedSignal rProcessedSignal gProcessedSigError rProcessedSigError gNumPixOLHi rNumPixOLHi gNumPixOLLo rNumPixOLLo gNumPix rNumPix gMeanSignal rMeanSignal gMedianSignal rMedianSignal gPixSDev rPixSDev gPixNormIQR rPixNormIQR gBGNumPix rBGNumPix gBGMeanSignal rBGMeanSignal gBGMedianSignal rBGMedianSignal gBGPixSDev rBGPixSDev gBGPixNormIQR rBGPixNormIQR gNumSatPix rNumSatPix gIsSaturated rIsSaturated PixCorrelation BGPixCorrelation gIsFeatNonUnifOL rIsFeatNonUnifOL gIsBGNonUnifOL rIsBGNonUnifOL gIsFeatPopnOL rIsFeatPopnOL gIsBGPopnOL rIsBGPopnOL IsManualFlag gBGSubSignal rBGSubSignal gBGSubSigError rBGSubSigError BGSubSigCorrelation gIsPosAndSignif rIsPosAndSignif gPValFeatEqBG rPValFeatEqBG gNumBGUsed rNumBGUsed gIsWellAboveBG rIsWellAboveBG gBGUsed rBGUsed gBGSDUsed rBGSDUsed IsNormalization gDyeNormSignal rDyeNormSignal gDyeNormError rDyeNormError DyeNormCorrelation ErrorModel xDev gSpatialDetrendIsInFilteredSet rSpatialDetrendIsInFilteredSet gSpatialDetrendSurfaceValue rSpatialDetrendSurfaceValue SpotExtentX SpotExtentY gNetSignal rNetSignal gMultDetrendSignal rMultDetrendSignal gProcessedBackground rProcessedBackground gProcessedBkngError rProcessedBkngError IsUsedBGAdjust gInterpolatedNegCtrlSub rInterpolatedNegCtrlSub gIsInNegCtrlRange rIsInNegCtrlRange gIsUsedInMD rIsUsedInMD
DATA 1 1 1 null 0 0 0 1 HsCGHBrightCorner HsCGHBrightCorner HsCGHBrightCorner null 265.978 241.933 3.13E-02 6.17E-02 6.12E-01 0 0 1 1 3.80E+03 4.08E+03 3.81E+02 4.10E+02 0 0 0 3 66 66 5.03E+02 5.84E+02 505 586.5 6.69E+01 6.82E+01 6.38E+01 6.34E+01 701 701 2.28E+01 3.74E+01 22 36 7.06E+00 1.13E+01 7.41E+00 1.11E+01 0 0 0 0 0.276065 -0.00661349 0 0 0 0 0 0 1 0 0 469.282 539.161 47.109 54.1613 0 1 1 0 0 1 1 1 1 33.3845 44.4295 7.06454 11.3114 0 3797.71 4081.29 381.234 409.985 0 1 5.07E-01 0 0 33.3845 44.4295 51.7088 51.7088 495.636 576.926 0.720307 0.606607 22.8417 37.3609 7.06454 11.3114 0 470.703 533.049 0 0 0 0
my data set has a common reference. Does it make sense to do a within-array normalization? I have read, that it is not such a good idea to run it.
Is it possible to run the single-channel protocol only on the green channel with this arrays?
Agilent is, per se, a two-color platform, but it can be used as a single color platform, you can find a lot of help on this in the 'limma' manual. The same goes for the normalization, it depends on your experimental design, and a lot of details and pointers to other papers can be found in the 'limma'-related manual/tutorial/papers.