How to read count information from an old RGB based Agilent DNA array
0
0
Entering edit mode
17 months ago
K.patel5 ▴ 150

Dear Biostars,

I am trying to prepare some published data to test a CNV filtration method I am working on. I would really like to use data from Conrad et al (2007), mostly because it is highly cited - and easy to access. Link here https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-142?query=E-MTAB-142.

Unfortunately, Aglient tech is a bit before my time and I am struggling to figure out how to read the count information. Here is a snippet of their data which should be easily loaded into R as a dataframe.

x <- structure(list(FEATURES = c("DATA", "DATA", "DATA"), FeatureNum = 6:8, 
    Row = c(1L, 1L, 1L), Col = 6:8, SubTypeMask = c(0L, 0L, 0L
    ), ControlType = c(0L, 0L, 0L), ProbeName = c("A_18_P17027306", 
    "chr1_165793426_165793473", "A_18_P14570373"), SystematicName = c("chr9:137150180-137150224", 
    "chr1:165793427-165793473", "chr3:198891339-198891384"), 
    LogRatio = c(0.08975880656, 0.1139920636, 0.1038222868), 
    LogRatioError = c(0.0619727653, 0.0625525488, 0.06214855983
    ), PValueLogRatio = c(0.1475167061, 0.06840327396, 0.09481057346
    ), gProcessedSignal = c(2550.198, 479.9035, 4755.878), rProcessedSignal = c(3135.688, 
    623.9445, 6040.224), gProcessedSigError = c(255.087, 48.34247, 
    475.6231), rProcessedSigError = c(313.597, 62.53483, 604.0367
    ), gMedianSignal = c(807.5, 188, 1464.5), rMedianSignal = c(1826, 
    405.5, 3546), gBGMedianSignal = c(38, 38, 38), rBGMedianSignal = c(43, 
    44, 44), gBGPixSDev = c(7.409993, 7.485912, 7.367351), rBGPixSDev = c(9.274448, 
    9.2318, 9.213135), gIsSaturated = c(0L, 0L, 0L), rIsSaturated = c(0L, 
    0L, 0L), gIsFeatNonUnifOL = c(0L, 0L, 0L), rIsFeatNonUnifOL = c(0L, 
    0L, 0L), gIsBGNonUnifOL = c(0L, 0L, 0L), rIsBGNonUnifOL = c(0L, 
    0L, 0L), gIsFeatPopnOL = c(0L, 0L, 0L), rIsFeatPopnOL = c(0L, 
    0L, 0L), gIsBGPopnOL = c(0L, 0L, 0L), rIsBGPopnOL = c(0L, 
    0L, 0L), IsManualFlag = c(0L, 0L, 0L), gBGSubSignal = c(772.789, 
    146.17, 1455.8), rBGSubSignal = c(1831.5, 366.544, 3568.53
    ), gIsPosAndSignif = c(1L, 1L, 1L), rIsPosAndSignif = c(1L, 
    1L, 1L), gIsWellAboveBG = c(1L, 1L, 1L), rIsWellAboveBG = c(1L, 
    1L, 1L), SpotExtentX = c(49.8279, 47.5395, 47.8731), gBGMeanSignal = c(37.7605, 
    37.804, 37.9131), rBGMeanSignal = c(43.431, 44.9921, 44.1818
    )), row.names = 6:8, class = "data.frame")

I am hoping to wrangle this data into something like a standard .BED file format for CNVs with the following column: Chromosome, Start, End, Type, Value.

The first three columns can be extracted from column 8 (SystematicName), but I am struggling to make sense on how I can ascertain the type (Deletion of Duplication), or Value (0, 1, 2, 3, 4, >4), as you would expect from modern CNV callers from WES/ WGS.

I assume the final few columns e.g. gBGMeanSignal and rBGMeanSignal might be valuable here as they seem to show normalised abundance values, but I am unsure weather to average them or add them together.

Any guidance would be most welcome. Also I see there is a p-value column - I assume it can be used to filter out values of low confidence?

Many Thanks, Krutik

CNV Agilent DNA WGS • 462 views
ADD COMMENT

Login before adding your answer.

Traffic: 2106 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6