I have compiled a list of biomarkers related to chemoresistance in triple-negative breast cancer patients using single-cell RNA-seq data. I am now performing validation on independent patient cohorts with microarray measurements. Some of the top genes identified by my method include a lot of single nuclear RNA (snRNA) genes such as SNAR-A10, SNAR-A11, SNAR-A7, SNAR-B8, etc. I cannot find these genes in the microarray dataset and am stuck at the validation step. My questions are as follows:
- Should I remove these RNA genes upfront and continue with the analysis? That way, these won't show up as biomarkers.
- Is there any way to map these genes to the microarray data (GPL96 HG-U133A Affymetrix Human Genome U133A Array)?
Any help is highly appreciated.
Thank you for your comment. I will look through the literature and try to find dedicated arrays. Would you suggest removing these genes from the single-cell data? I am thinking in lines of preprocessing.
I would consider that indeed. Maybe remove all small RNA types or retain only biotypes such as lncRNAs and protein-coding genes. This can be fone directly be filtering the count matrix.