I would like to extract the first two words from characters. For example,
y <- data.frame(name = c('london hilss sff', 'newyork hills fff', 'paris'))
I want to get words less or equal 2;
name
'london hilss'
'newyork hills'
'paris'
I would like to extract the first two words from characters. For example,
y <- data.frame(name = c('london hilss sff', 'newyork hills fff', 'paris'))
I want to get words less or equal 2;
name
'london hilss'
'newyork hills'
'paris'
gsub('^(\\S*\\s*\\S*).*$','\\1',y$name)
[1] "london hilss" "newyork hills" "paris"
regular expressions FTW!
edit: used \\S to capture "words" instead of \\w, allowing all non-whitespace characters to be part of "words"
split the character vector by space, then get first two.
> y <- data.frame(name = c('london hilss sff', 'newyork hills fff', 'paris'))
> library(stringr)
> library(tidyr)
> str_to_sentence(unite(data.frame(str_split(y$name," ",3, simplify = T)[,c(1:2)]), "new", sep = " ")$new)
[1] "London hilss" "Newyork hills" "Paris "
Edit: I think I prefer Malcolm's response below! Much shorter and simpler, although maybe less readable.
You can split like suggested by cpad--that's simplest.
Like this:
> firstN <- function(x, n) {
words <- strsplit(x, " ")[[1]]
paste(words[1:min(2, length(words))], collapse = " ")
}
> sapply(y$name, FUN = function(x) firstN(x, 2), USE.NAMES = F)
[1] "london hilss" "newyork hills" "paris"
I had to make firstN
because if you ask for c("Test")[1:2]
, for example, you'll get an NA.
Alternatively you can use the word
function from stringr
.
The base function works for strings that have at least two words:
> library(stringr)
> y <- data.frame(name = c('london hilss sff', 'newyork hills fff', 'paris'))
> word(y$name, 1, 2)
[1] "london hilss" "newyork hills" NA
Although unfortunately it doesn't work for just one word.
You can hack together something that fixes that, though, like this:
words_or_fewer <- function(str, n) {
answer <- word(str, start = 1, end = n)
while(n > 0) {
# If the answer is NA, try to get fewer words
if(is.na(answer)) {
n <- n - 1
answer <- word(str, start = 1, end = n)
} else {
break()
}
}
answer
}
# Just a wrapper to use words_or_fewer with vectors
words_or_fewer_vec <- function(str_vec, n) {
sapply(str_vec, FUN = function(str) words_or_fewer(str, n), USE.NAMES = F)
}
> words_or_fewer_vec(y$name, 2)
[1] "london hilss" "newyork hills" "paris"
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.