ColData and rowData Sample_id mapping from TCGA database
1
0
Entering edit mode
24 months ago
Jakpa ▴ 50

Hello everyone,

I downloaded Bladder cancer data from TCGA . I extracted the sample id with this code:

head(Blca_res$id)

output: 'a8c61671-89cb-43bc-8c88-5c107954d11c,''b03b7b9b-00ef-4e0d-bac2-0b1059d57a87,''bf98764d-1604-4a14-8e06-1c785a085db9,''c0bc697a-ac64-4605-9abc-f0fe85eb481a,''bd52f6c8-6f8b-4056-8a3e-8cdc96644952,''ab504dbf-e1f0-46d2-83f9-0f4066055c71'

I wrote this to get same from clinical data:

head(tcgaBlca_data@colData$sample_id)

output: 'f9bd70b2-6cde-48e5-9f0d-55d86ccfeba8,''3cae49a3-6deb-40f9-84cc-68b9b53543ff,''015e6b08-ab3c-4d1d-99e4-77b5e10bd7fc,''f09e1eeb-bcd5-4dba-92f0-7d4b34b81ce7,''0ac8e522-3c64-42f2-a66f-bd40530a328a,''3c71158d-98ff-4ef5-923f-ba31a25036ec'.

There are more than 60,000 rows with this sampl_id's. What I want to find out is if each sample Id in Blca_res$id are same with tcgaBlca_data@colData$sample_id. e.g, is 'a8c61671-89cb-43bc-8c88-5c107954d11c from Blca_res$id also in tcgaBlca_data@colData$sample_id?

Any suggestion on how I can implement this with lines of code in R?

Regards,

GeneExpression R TCGA • 795 views
ADD COMMENT
0
Entering edit mode

is the format of your head output correct? Do the sample ids actually have commas in the string?

ADD REPLY
0
Entering edit mode

No. There no commas. but a dot like this .

but, I have sorted it using a more readable column in the data.

Thanks

ADD REPLY
0
Entering edit mode
24 months ago
jv ★ 1.8k

One option to get a quick count would be to use the R table function. In this case I would use table twice to count how many of the sample ids are present once or twice between the two vectors, e.g.,

table(table(c(Blca_res$id, tcgaBlca_data@colData$sample_id))

To show how this would play out:

df <- data.frame("id" = c('a8c61671-89cb-43bc-8c88-5c107954d11c', 'b03b7b9b-00ef-4e0d-bac2-0b1059d57a87', 'bf98764d-1604-4a14-8e06-1c785a085db9', 'c0bc697a-ac64-4605-9abc-f0fe85eb481a', 'bd52f6c8-6f8b-4056-8a3e-8cdc96644952' , 'ab504dbf-e1f0-46d2-83f9-0f4066055c71'), 
                 "sample_id" = c('f9bd70b2-6cde-48e5-9f0d-55d86ccfeba8', '3cae49a3-6deb-40f9-84cc-68b9b53543ff', '015e6b08-ab3c-4d1d-99e4-77b5e10bd7fc','f09e1eeb-bcd5-4dba-92f0-7d4b34b81ce7','0ac8e522-3c64-42f2-a66f-bd40530a328a','3c71158d-98ff-4ef5-923f-ba31a25036ec'))
table(table(c(df$id, df$sample_id)))

 1 
12

meaning that all 12 of the ids in df$id and df$sample_id occur once each...

ADD COMMENT

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6