Creating a cls file from two groups of TCGA RNA-seq samples
1
0
Entering edit mode
3.8 years ago
ivykosater • 0

I want to look at two separate groups of TCGA-BRCA RNA-seq samples. One with a specific mutation and one without. What I have done is download the RNA-seq expression for all the TCGA-BRCA samples and create a gct file. However, I am unsure of how to create a cls file which designates the samples as either "wildtype" or "mutation". There are over 3000 samples in the gct file. I have a list of barcodes of sample with the mutation, but I'm not sure how to use this to generate the cls file. Does anyone have any insight?

TCGA RNA-Seq R GSEA • 1.9k views
ADD COMMENT
0
Entering edit mode
3.8 years ago

Hey ivykosater,

The cls file should look something like this:

35 7 1
# d0 d1 d2 d4 d6 d8 d10
d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10 d0 d1 d2 d4 d6 d8 d10
  • So, that is 35 samples and 7 unique levels - the 1 is always 1.
  • Line #2 just lists the unique levels
  • Line #3 should correspond to the sample columns of your GCT file.

In your case, you may consider a 1 and 0 encoding for 'with' and 'without' mutation, respectively.

More information here: https://www.genepattern.org/file-formats-guide#CLS

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin, thanks for the summary. Was curious if you knew a way I could assign a phenotype label that corresponds to the each sample. I have a list of samples with the mutation and without, but going through 3000 samples assigning labels to each seems tedious. Perhaps I could use an R script.

ADD REPLY
0
Entering edit mode

Oh, I see what you mean. Do you have a sample of the input data that you've got?

ADD REPLY

Login before adding your answer.

Traffic: 2798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6