How to remove certain samples from SummarizedExperiment dataset? (BioConductor)
0
1
Entering edit mode
7.5 years ago
BPors ▴ 60

Hi,

I am having a problem with my SummarizedExperiment dataset. I have a RNA-seq data and I want to analyze gene expression from there. However, I want to remove certain samples from the dataset and I could not be able to do it. The code I had tried until now:

> library(SummarizedExperiment)

> data <- readRDS("ABC.rds")

> colData(data)[1:5, 1:2]

> data

Output is:

class: RangedSummarizedExperiment
dim: 20115 424
assays(2): counts logCPM rownames(20115): 1 2 ... 102724473 103091865 rowRanges metadata column names(3): symbol txlen txgc colnames(424): TCGA.KL.AAAAA
TCGA.KL.BBBBBB ... TCGA.KL.ZZZZZ colData names(549): type bcr_patient_uuid

And the output follows as:

TCGA.KL.AAAAA na

TCGA.KL.BBBBB na

TCGA.KL.CCCCC na

TCGA.KL.DDDD na

When I do batch identification with the following code:

> TSS <- substr(colnames(data), 6, 7) table(TSS)

Output is:

> TSS

KJ KJ1 KJ2 KJ3
30 0 1 16

And I want to remove the samples (for example,TCGA.KL.AAAAAA or any other), which has KJ1 or KJ2 in their information. However, since the dataset is shaped very differently, if I remove KJ1 and KJ2 from TSS, their related samples are not getting erased from the dataset:

> TSS<- TSS[!(TSS %in% c('KJ1','KJ2')]

Output is:

KJ KJ3
30 16

However, I still have the same number of samples(20115)..But I want them to be less than that because I am removing some bathces.. How can I remove these samples associated with specific batches?

RNA-Seq tcga bioconductor R SummarizedExperiment • 3.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6