How can I separate 3 different pieces of information in a column?
4
For example, in the column I have, there is a line written Ser25Phe. And I want to split the column written HGVS.Consequence as Ser 25 Phe.
Programming
regex
split
R
gsub
• 1.4k views
Since you tagged it as R I'll add an R answer.
Example data.
df <- data.frame(HGVS.Consequence=c("Met1?", "Phe12Ser", "Ala2Glu"))
> df
HGVS.Consequence
1 Met1?
2 Phe12Ser
3 Ala2Glu
Tidyverse answer.
library("tidyr")
extract(
df, HGVS.Consequence, into=c("aa1", "pos", "aa2"),
regex="(^[A-Z][a-z]+|\\?)([[:digit:]]+)([A-Z][a-z]+|\\?)")
aa1 pos aa2
1 Met 1 ?
2 Phe 12 Ser
3 Ala 2 Glu
sed 's/^\([^0-9]*\)\([0-9]*\)\([^0-9]*\)$/\1\t\2\t\3/' < in > out
Here is Perl one-liner solution :
echo "Met1?"|perl -ne 's/(\D+)(\d+)(\D+)/\1 \2 \3/g; print'
Met 1 ?
$ echo "Ser25Phe" | sed -r 's/^([^[:digit:]]+)([[:digit:]]+)([^[:digit:]]+)$/\1\t\2\t\3/'
Ser 25 Phe
$ echo "Ser25Phe"| while read line; do echo ${line%%[0-9]*}"\t"${line//[^0-9]}"\t"${line##*[0-9]};done
Ser 25 Phe
with R:
> df <- data.frame(HGVS.Consequence=c("Met1?", "Phe12Ser", "Ala2Glu","?1Met"))
> library(stringr)
> library(magrittr)
> str_extract_all(df$HGVS_Consequence,"\\D+|\\d+", simplify = T) %>%
+ set_colnames(c("Before", "Position", "After"))
Before Position After
[1,] "Met" "1" "?"
[2,] "Phe" "12" "Ser"
[3,] "Ala" "2" "Glu"
[4,] "?" "1" "Met"
AsAsp line/record will not be parsed this way.
Login before adding your answer.
Traffic: 2563 users visited in the last hour