R: Regex madness (stringi) -
i have vector of strings this:
g30(h).g3(m).g0(l).replicate(1) iterating on c("h", "m", "l"), extract g30 (for "h"), g3 (for "m") , g0 (for "l").
my various attempts have me confused - regex101.com debugger, e.g. indicates (\w*)\(m\) works fine, transferring r fails ...
using stringi package , outer() function:
library(stringi) strings <- c( "g30(h).g3(m).g0(l).replicate(1)", "g5(m).g11(l).g6(h).replicate(9)", "g10(m).g6(h).g8(m).replicate(200)" # no "l", repeated "m" ) targets <- c("h", "m", "l") patterns <- paste0("\\w+(?=\\(", targets, "\\))") matches <- outer(strings, patterns, fun = stri_extract_first_regex) colnames(matches) <- targets matches # h m l # [1,] "g30" "g3" "g0" # [2,] "g6" "g5" "g11" # [3,] "g6" "g10" na this ignores instances of target letter past first, gives na when target's not found, , returns in simple matrix. regular expressions stored in patterns match substrings xx(y), y target letter , xx number of word characters.
Comments
Post a Comment