Question

How To Process With Geo Raw Data Downloaded From Ncbi

2

Entering edit mode

13.2 years ago

Tao Zhao ▴ 20

Hi everyone!

Recently I’ve downloaded a raw data (≈ 400M) from GEO database from NCBI. The Platform : NimbleGen GDR Malus domestica EST UnigeneV4 array. Overall Design: “Using a single color labeling system, a total of 24 microarray slides were utilized, one for each cortex tissue sample, for transcriptome profiling analysis. 2 cultivars x 3 developmental stages x 4 biological replicates.” Each sample has a normalization RMA data.

Here's my question: HOW to process these raw data before Cluster to find genes upregulated or downregulated . the data are all positive numbers, how to get a log ratio.

GEO url: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24523

I've little experience and so confused. You can recommend some materials for me to learn. Thanks for your help!

geo data • 8.7k views

ADD COMMENT • link updated 13.2 years ago by Yogesh Pandit ▴ 520 • written 13.2 years ago by Tao Zhao ▴ 20

score 2 · Answer 1 · 2011-12-05

2

Entering edit mode

13.2 years ago

Sean Davis 27k

I'd suggest that you find a bioinformatics collaborator to work with you on these data. While your questions have answers, an online forum may not be the best way for you to move forward.

ADD COMMENT • link 13.2 years ago by Sean Davis 27k

1

Entering edit mode

I agree with Sean. While this response may not be the immediate, out-of-the-box solution for which you were looking, it is the most practical. Processing GEO raw data to log ratio and gene set/pathway enrichment and all is a mutli-step process, which, in this forum, would ideally be presented as a series of single questions. Look for a patient, communicative bioinformatics collaborator.

ADD REPLY • link 13.2 years ago by Larry_Parnell 16k

0

Entering edit mode

I see that the last post was 5 years ago...has the situation changed at all since then? I'm still faced with the situation that the curators had not gone through the data yet, such as this one: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42608 .

What is involved in parsing the GEO data into R data structures? I know a bit of statistics and I have statistics friends who can take over to do the statistics, but could you give me a 30,000 feet view of the steps involved (cleaning the sequences, pairing the ends, etc) to get there? Thanks!

ADD REPLY • link 8.0 years ago by jxl1008 • 0

0

Entering edit mode

Ok，thank you two. Anyway,if you are familiar with this data process procedures,you can write these steps roughly here to me, just some keywords will be ok. I'm so curious about this. Maybe it is a little hard for me to find a collaborator, I am such a primary user and many people here are experts as i see. ……Huge gaps.

ADD REPLY • link 13.2 years ago by Tao Zhao ▴ 20

0

Entering edit mode

RMA data are not raw data; they are normalized already. You can use those data directly to cluster. You do not need to form log ratios to cluster, either. As for up/down regulated genes, clustering does not tell you that. You will need to do a statistical test to find those genes that are up/down regulated.

ADD REPLY • link 13.2 years ago by Sean Davis 27k

0

Entering edit mode

Thank you so much Sean, you've enlightened me a lot.

ADD REPLY • link 13.2 years ago by Tao Zhao ▴ 20

score 1 · Answer 2 · 2011-12-06

1

Entering edit mode

13.2 years ago

Yogesh Pandit ▴ 520

GEO DataSet Cluster Analysis

Also as a starter, you can play with the R script generated by GEO2R to handle the dataset.

ADD COMMENT • link 13.2 years ago by Yogesh Pandit ▴ 520

0

Entering edit mode

Thank you ! I think I've known the fundamental procedures. I am now learning some variance analysis such as t-test、F-test、SAM. After the statistical test , then cluster.