Question

Deleted:Help Needed: Preparing Training and Testing Datasets for PAM Algorithm from GEO GSE15126 CEL Files

0

Entering edit mode

11 weeks ago

clara-28 • 0

Hi everyone,

I’m currently working with gene expression data from GEO dataset GSE15126, which I have downloaded in CEL file format. I have successfully read and normalized the data, but I am unsure about how to properly split the samples into training and testing datasets and format these files for use with the PAM (Prediction Analysis for Microarrays) algorithm.

Here are the specifics:

Dataset Information:
- Total Samples: 40
- Number of Probe Sets:52,920
Requirements for PAM Algorithm:
- Training dataset (AllGenes_PAMTrain.txt): 15 samples
- Testing dataset (AllGenes_PAMTest.txt): 30 samples

My questions are:

How should I split my 40 samples into training and testing datasets? What is the recommended approach or best practice for this split and why in the paper they say 15 and 30?
What should be the format and structure of the training and testing data files? Specifically:
- How should the data be organized in these files (e.g., tab-delimited format)?
- What information should be included in these files to ensure compatibility with PAM?
How do I ensure that the datasets are correctly prepared for the PAM algorithm? Are there any specific considerations for data preprocessing, normalization, or file formatting that I should be aware of?

Here’s a brief overview of the steps I’ve taken so far:

Read and normalized CEL files.
Need guidance on splitting and preparing data for PAM analysis.

I can post also the script of the supplementaty data of the paper Thank you in advance for your help and suggestions!

PAM Microarray • 259 views

ADD COMMENT • link updated 11 weeks ago by Ram 44k • written 11 weeks ago by clara-28 • 0