I am planning a microarray experiment, using the Clariom human assay (either Deep or Shallow). Before running the assay, I am trying to figure out which genes are covered by the assays.
I have managed to download the annotation and load it into R. Each row in the file is a probeset with different columns for various parameters. My task is to find out which genes that are covered by all the probesets. There is a "gene assignment" column for this. However, for each row, there are multiple entries in various formats , separated by "//" or "///". For instance, the probeset "JUC0100047124.hg.1" has the following gene assignment: "NR_046018 // DDX11L1 /// OTTHUMT00000002844 // DDX11L1 /// OTTHUMT00000362751 // DDX11L1".
This makes it very difficult for me to work with in R. What I want to do is to extract a vector of all gene IDs, so I can match them with a vector of genes of interest (to see if they are covered or not).
Any suggestions how to work with this annotation data, preferably in R?