Question

How to remove certain samples from SummarizedExperiment dataset? (BioConductor)

1

Entering edit mode

8.2 years ago

BPors ▴ 60

Hi,

I am having a problem with my SummarizedExperiment dataset. I have a RNA-seq data and I want to analyze gene expression from there. However, I want to remove certain samples from the dataset and I could not be able to do it. The code I had tried until now:

> library(SummarizedExperiment)

> data <- readRDS("ABC.rds")

> colData(data)[1:5, 1:2]

> data

Output is:

class: RangedSummarizedExperiment
dim: 20115 424
assays(2): counts logCPM rownames(20115): 1 2 ... 102724473 103091865 rowRanges metadata column names(3): symbol txlen txgc colnames(424): TCGA.KL.AAAAA
TCGA.KL.BBBBBB ... TCGA.KL.ZZZZZ colData names(549): type bcr_patient_uuid

And the output follows as:

TCGA.KL.AAAAA na

TCGA.KL.BBBBB na

TCGA.KL.CCCCC na

TCGA.KL.DDDD na

When I do batch identification with the following code:

> TSS <- substr(colnames(data), 6, 7) table(TSS)

Output is:

> TSS

KJ KJ1 KJ2 KJ3
30 0 1 16

And I want to remove the samples (for example,TCGA.KL.AAAAAA or any other), which has KJ1 or KJ2 in their information. However, since the dataset is shaped very differently, if I remove KJ1 and KJ2 from TSS, their related samples are not getting erased from the dataset:

> TSS<- TSS[!(TSS %in% c('KJ1','KJ2')]

Output is:

KJ KJ3
30 16

However, I still have the same number of samples(20115)..But I want them to be less than that because I am removing some bathces.. How can I remove these samples associated with specific batches?

RNA-Seq tcga bioconductor R SummarizedExperiment • 3.9k views

ADD COMMENT • link 8.2 years ago by BPors ▴ 60