Entering edit mode
23 months ago
Khaleesi95
▴
40
Hi guys, I've the following dataframe df:
IID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
1 11:00223.CEL 0.00229647 -0.000423608 0.001480000 1.02983e-03 0.00418171 -0.00550339 0.003826840 0.002836460 -0.001210240
2 9399.CEL 0.00213518 -0.000734612 0.000965396 9.84589e-04 -0.00135706 0.00396061 0.006356090 0.000639752 -0.000220536
3 shckanc.CEL 0.00225502 -0.000971542 0.001553290 2.13150e-03 -0.00277924 0.00291143 0.007620090 -0.002271640 -0.003364920
4 787323A1.CEL 0.00292725 -0.000399576 0.001340910 6.99230e-06 -0.00299630 -0.00769350 -0.007620050 0.002889230 0.005459370
5 90w3jc.CEL 0.00228869 -0.000201784 0.001291800 8.68436e-04 -0.00194812 -0.00554044 -0.006723900 0.001160370 0.005532150
6 olsj909.CEL 0.00224916 -0.000530798 0.000677401 -1.13087e-04 -0.00132783 -0.00814353 -0.000705561 0.000352494 0.000671816
Considering the first column IID, I need to extract all the string before the dot, excluding .CEL
Just an example: for the first row, I need to extract 11:00223
, for the second row 9399
and so on...
I tried using the library stringr
and then the following command line:
str_extract(df$IID, "[[:digit:]]+")
but [[:digit:]]+
extracts only number before the dot, while I need all the string before the dot, included letters of the alphabet.
Any idea about how to extract all the string before the dot?
Thank you!
Could you not just use
strsplit
and get the first element?Since
[[:digit:]]+
is the Regex for "just numbers", it is evident that you won't get letters. In general, I recommend familiarizing yourself with Regular Expressions, as this is extremely handy knowledge to have.However, to solve your problem, I think it is the easiest to just drop the
.CEL
.Relevant StackOverflow post: