Hi
I am trying to perform classification for distinguishing cancer from controls in gene expression data. I am finding it difficult as the datasets are very large. And how do i remove the repeated gene symbol so that i can select the specific attributes(gene selection)?
This question (as written) lacks sufficient detail to generate useful answers (see this for guidance: How To Ask Good Questions On Technical And Scientific Forums ). You need to clarify what these datasets are from and what kind of analysis you are doing on them.
Please can you inform what software application you use to open raw/processed expression? In general, I directly read the files in UNIX or command line.
In regard to your second question. Uniprot provides direct gene symbol mapping. After providing the input, the web tool also provides you with a non-redundant list.
If you wish to do it manually, then you can do it in Excel or any basic coding language, but make sure you consider "space" as a character.
This question (as written) lacks sufficient detail to generate useful answers (see this for guidance: How To Ask Good Questions On Technical And Scientific Forums ). You need to clarify what these datasets are from and what kind of analysis you are doing on them.