Entering edit mode
5.3 years ago
Kim
▴
20
Hello everyone
I'm working on gene expression data from a human exon array. I want to have a column of gene symbols but the only column giving me that information is "gene assignment" and the information looks like this.
NM_001156474 // CCDC81 // coiled-coil domain containing 81 // 11q14.2 // 60494 /// NM_021827 // CCDC81 // coiled-coil domain containing 81 // 11q14.2 // 60494 /// ENST00000445632 // CCDC81 // coiled-coil domain containing 81 // 11q14.2 // 60494 /// ENST00000354755 // CCDC81 // coiled-coil domain containing 81 // 11q14.2 // 60494 /// BC126412 // CCDC81 // coiled-coil domain containing 81 // 11q14.2 // 60494 /// ENST00000278487 // CCDC81 // coiled-coil domain containing 81 // 11q14.2 // 60494
I would like to extract gene symbols from this (CCDC81 in this case). Does anyone know how I can do that in R?
Thank you very much
Have you tried the
strsplit
function in R?Yes I'm trying to use strsplit but this function works with vector and the "gene assignment" data type is factor so it makes the work not straightforward.
It's hard to propose help when your problem is not completely described in the original question. The following works for me, could it be adapted to your data?
Hi Russ
I tried this command and it works. Thank you :)
for (i in 1:11005) { Gene_symbol[i] <- strsplit(full_table$gene_assignment, " // ")[[i]][2] }
You can avoid confusion due to factors with
read.table(..., stringsAsFactors = FALSE)
or data.table'sfread
(stringsAsFactors = FALSE by default). In case you hear otherwise, overriding R's defaults to set this as FALSE globally for each session will only cause you pain in the future, but it's fine for reading files in.EDIT: if this doesn't work for you since you're talking about another data type, try coercing to a character vector first.