I am analyzing Clariom-D array data using the oligo package in R.
I find that the rma normalization step using all probesets works reasonably well (normalized intensity boxplots centered and evenly distributed), but upon filtering the results only for known genes (with symbols) the normalized intensity distribution appears to be very biased (boxplots are not centered nor have even distributions anymore). I think this could be explained by differences in RNA biotype composition (real or perhaps an artefact derived from partial RNA degradation or differences in RNA fragmentation steps leading to uneven representation of signals by different RNA biotypes). I would like to test subsetting to the probests that will be assessed in the end -genes with known symbol- prior to normalization to get more comparable data for statistical analysis of differential gene expression -more homogeneous normalized intensity distributions on boxplots-. Is this possible? If so could anyone give some code examples?
Thank you in advance.
Lauro