Hi everyone!
Recently I’ve downloaded a raw data (≈ 400M) from GEO database from NCBI. The Platform : NimbleGen GDR Malus domestica EST UnigeneV4 array. Overall Design: “Using a single color labeling system, a total of 24 microarray slides were utilized, one for each cortex tissue sample, for transcriptome profiling analysis. 2 cultivars x 3 developmental stages x 4 biological replicates.” Each sample has a normalization RMA data.
Here's my question: HOW to process these raw data before Cluster to find genes upregulated or downregulated . the data are all positive numbers, how to get a log ratio.
GEO url: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24523
I've little experience and so confused. You can recommend some materials for me to learn. Thanks for your help!
I agree with Sean. While this response may not be the immediate, out-of-the-box solution for which you were looking, it is the most practical. Processing GEO raw data to log ratio and gene set/pathway enrichment and all is a mutli-step process, which, in this forum, would ideally be presented as a series of single questions. Look for a patient, communicative bioinformatics collaborator.
I see that the last post was 5 years ago...has the situation changed at all since then? I'm still faced with the situation that the curators had not gone through the data yet, such as this one: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42608 .
What is involved in parsing the GEO data into R data structures? I know a bit of statistics and I have statistics friends who can take over to do the statistics, but could you give me a 30,000 feet view of the steps involved (cleaning the sequences, pairing the ends, etc) to get there? Thanks!
Ok,thank you two. Anyway,if you are familiar with this data process procedures,you can write these steps roughly here to me, just some keywords will be ok. I'm so curious about this. Maybe it is a little hard for me to find a collaborator, I am such a primary user and many people here are experts as i see. ……Huge gaps.
RMA data are not raw data; they are normalized already. You can use those data directly to cluster. You do not need to form log ratios to cluster, either. As for up/down regulated genes, clustering does not tell you that. You will need to do a statistical test to find those genes that are up/down regulated.
Thank you so much Sean, you've enlightened me a lot.