Hi,
I am having a problem with my SummarizedExperiment dataset. I have a RNA-seq data and I want to analyze gene expression from there. However, I want to remove certain samples from the dataset and I could not be able to do it. The code I had tried until now:
> library(SummarizedExperiment)
> data <- readRDS("ABC.rds")
> colData(data)[1:5, 1:2]
> data
Output is:
class: RangedSummarizedExperiment
dim: 20115 424
assays(2): counts logCPM rownames(20115): 1 2 ... 102724473 103091865 rowRanges metadata column names(3): symbol txlen txgc colnames(424): TCGA.KL.AAAAA
TCGA.KL.BBBBBB ... TCGA.KL.ZZZZZ colData names(549): type bcr_patient_uuid
And the output follows as:
TCGA.KL.AAAAA na
TCGA.KL.BBBBB na
TCGA.KL.CCCCC na
TCGA.KL.DDDD na
When I do batch identification with the following code:
> TSS <- substr(colnames(data), 6, 7) table(TSS)
Output is:
> TSS
KJ KJ1 KJ2 KJ3
30 0 1 16
And I want to remove the samples (for example,TCGA.KL.AAAAAA or any other), which has KJ1 or KJ2 in their information. However, since the dataset is shaped very differently, if I remove KJ1 and KJ2 from TSS, their related samples are not getting erased from the dataset:
> TSS<- TSS[!(TSS %in% c('KJ1','KJ2')]
Output is:
KJ KJ3
30 16
However, I still have the same number of samples(20115)..But I want them to be less than that because I am removing some bathces.. How can I remove these samples associated with specific batches?