Question

How to extract a string before the dot in R

1

Entering edit mode

2.1 years ago

Khaleesi95 ▴ 40

Hi guys, I've the following dataframe df:

IID        PC1          PC2         PC3          PC4         PC5         PC6          PC7          PC8          PC9
1 11:00223.CEL 0.00229647 -0.000423608 0.001480000  1.02983e-03  0.00418171 -0.00550339  0.003826840  0.002836460 -0.001210240
2      9399.CEL 0.00213518 -0.000734612 0.000965396  9.84589e-04 -0.00135706  0.00396061  0.006356090  0.000639752 -0.000220536
3     shckanc.CEL 0.00225502 -0.000971542 0.001553290  2.13150e-03 -0.00277924  0.00291143  0.007620090 -0.002271640 -0.003364920
4   787323A1.CEL 0.00292725 -0.000399576 0.001340910  6.99230e-06 -0.00299630 -0.00769350 -0.007620050  0.002889230  0.005459370
5     90w3jc.CEL 0.00228869 -0.000201784 0.001291800  8.68436e-04 -0.00194812 -0.00554044 -0.006723900  0.001160370  0.005532150
6    olsj909.CEL 0.00224916 -0.000530798 0.000677401 -1.13087e-04 -0.00132783 -0.00814353 -0.000705561  0.000352494  0.000671816

Considering the first column IID, I need to extract all the string before the dot, excluding .CEL Just an example: for the first row, I need to extract 11:00223, for the second row 9399 and so on...

I tried using the library stringr and then the following command line:

str_extract(df$IID, "[[:digit:]]+")

but [[:digit:]]+ extracts only number before the dot, while I need all the string before the dot, included letters of the alphabet.

Any idea about how to extract all the string before the dot?

Thank you!

R • 6.9k views

ADD COMMENT • link updated 2.1 years ago by zx8754 12k • written 2.1 years ago by Khaleesi95 ▴ 40

2

Entering edit mode

Could you not just use strsplit and get the first element?

ADD REPLY • link 2.1 years ago by barslmn ★ 2.3k

1

Entering edit mode

Since [[:digit:]]+ is the Regex for "just numbers", it is evident that you won't get letters. In general, I recommend familiarizing yourself with Regular Expressions, as this is extremely handy knowledge to have.

However, to solve your problem, I think it is the easiest to just drop the .CEL.

gsub("\\.CEL$","",df$IID)

ADD REPLY • link 2.1 years ago by Matthias Zepper 5.0k

1

Entering edit mode

Relevant StackOverflow post:

Remove part of string after "."

ADD REPLY • link 2.1 years ago by zx8754 12k

score 4 · Accepted Answer · 2023-01-04

4

Entering edit mode

2.1 years ago

ATpoint 87k

gsub("\\..*", "", string)

ADD COMMENT • link 2.1 years ago by ATpoint 87k