Hi, I have downloaded the Affymetrix Exon Array data from ENCODE. I wanted to calculate the expression of genes from this data. I extracted the exon start and end sites of each gene from ENSEMBLE. And whenever I find an overlap between the exon region of my gene and the regions on Affymetrix Exon Array data, I regard it as the expression observed. In the end I average what I see and consider it as the expression.
It is biologically correct to do it? for example, if two exons of the a gene overlap the same region, should I count it twice?
thanks.
There is a nice article by Helen Lockstone (http://bib.oxfordjournals.org/content/12/6/634.full) that walks users through Affy exon array analyses using both Affy Power Tools and R. This has helped me out on numerous occasions.
Hi, @Neilfws and @Chris Cabanski, thnaks for your answer. The tools that you have introduced does not work for me. The files that I have are like this: the start site and end site of the sequence, and the level. http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwAffyExonArray/ So I dont have any information about the prob etc. I only have start and end.
The files that you should be downloading, in order to use Bioconductor tools most effectively, are the raw data - those files that end in "CEL.gz".
Thanks Neil, but the CEL files are only provided for limited number of cell lines. And I want to computed the expression for most of the available cell lines.