Clustering tissues using gene expression data
0
0
Entering edit mode
5.2 years ago
Natasha ▴ 40

I would like reproduce the tissue cluster tree reported in figure 1 and figure 2 of this paper

In the supplementary document related with this paper, it is mentioned that the raw intensity data from "A gene atlas of the mouse and human protein-encoding transcriptomes" has been used. However, I couldn't find the raw intensity files on NCBI.

Has someone had a chance to reproduce the result reported in this reference ? It would be of great help if the data and the scripts used to generate the cluster map is available in any public repository.

gene-expression clustering • 1.1k views
ADD COMMENT
1
Entering edit mode

No, it doesn't. It says The raw intensity data were transformed to normalized expression levels with the robust multi-array average (RMA) low-level algorithm [2] implemented in the BioConductor package [3]. They used normalized intensity values and from this probably the differences between the samples. This is an array, not RNA-seq, so relative measures. Arrays can inform about differences between samples but you cannot derive anything from the intensity of a single gene. I would also be surprised if the author responded as it is a paper from 2006. The people involved (except the senior author) probably left many years back. If you want to reproduce then download the raw data, normalize, perform differential analysis and then cluster based on the obtained log2 fold-changes, maybe transformed to the Z-scale.

ADD REPLY
0
Entering edit mode

Thanks a lot for the response. In the section on 'Microarray procedure' of the reference , "A gene atlas of the mouse and human protein-encoding transcriptomes" it is mentioned that the raw files can be found in http://symatlas.gnf.org. However, I couldn't locate the raw files.

ADD REPLY
0
Entering edit mode

@ATpoint Apparently, symatlas has been navigated to BioGPS and the supplementary files are available here.

I could find the same files on GEO with accession number GSE1133. However , the data is available in different formats like CDF, CIF, GIN, PSI, SIF, PROBE, TAB, TXT. I am not sure which data format has to be downloaded to implement the following suggestion given in the above response,

normalize, perform differential analysis and then cluster based on the obtained log2 fold-changes, maybe transformed to the Z-scale

ADD REPLY
0
Entering edit mode

Have you tried contacting the authors?

ADD REPLY
0
Entering edit mode

Yes I wrote an email to Prof. Bork who is the corresponding author. Unfortunately, I didn't get any response yet.

ADD REPLY

Login before adding your answer.

Traffic: 1401 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6