Deleted:Help Needed: Preparing Training and Testing Datasets for PAM Algorithm from GEO GSE15126 CEL Files
0
0
Entering edit mode
11 weeks ago
clara-28 • 0

Hi everyone,

I’m currently working with gene expression data from GEO dataset GSE15126, which I have downloaded in CEL file format. I have successfully read and normalized the data, but I am unsure about how to properly split the samples into training and testing datasets and format these files for use with the PAM (Prediction Analysis for Microarrays) algorithm.

Here are the specifics:

  1. Dataset Information:

    • Total Samples: 40
    • Number of Probe Sets:52,920
  2. Requirements for PAM Algorithm:

    • Training dataset (AllGenes_PAMTrain.txt): 15 samples
    • Testing dataset (AllGenes_PAMTest.txt): 30 samples

My questions are:

  1. How should I split my 40 samples into training and testing datasets? What is the recommended approach or best practice for this split and why in the paper they say 15 and 30?

  2. What should be the format and structure of the training and testing data files? Specifically:

    • How should the data be organized in these files (e.g., tab-delimited format)?
    • What information should be included in these files to ensure compatibility with PAM?
  3. How do I ensure that the datasets are correctly prepared for the PAM algorithm? Are there any specific considerations for data preprocessing, normalization, or file formatting that I should be aware of?

Here’s a brief overview of the steps I’ve taken so far:

  • Read and normalized CEL files.
  • Need guidance on splitting and preparing data for PAM analysis.

I can post also the script of the supplementaty data of the paper Thank you in advance for your help and suggestions!

PAM Microarray • 259 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1268 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6