looking for a Lua-based solution for splitting a string into two or more components -

this first posting site, please bear me.

consider following, representative string:

fld u.a. ldfjal \verb*u.a.* dlf \lstinline$u.a.$ u.a. dfla \url{u.a.}rrr 

for background: \verb*....* , \lstline$...$ latex macros arguments aren't delimited matching curly braces but, instead, common character: * in case of \verb, , $ in case of \lstinline. important point delimiter characters can printable ascii character except { , }; 1 should not assume * or $ used delimiters in (or any) cases. separately, \url{...} latex macro argument delimited curly braces. full string should assume contain utf8-encoded characters; simplicity, let's assume they're pure ascii characters.

i'm looking create (hopefully reasonably efficient...) lua-based way split full string 2 sets of substrings: (a) parts consist of latex macros , associated arguments , (b) other parts. eventual goal feed "other parts" string.gsub function call.

turning preceding example, how might 1 separate string

into "y" (inside verbatim-like macro) , "n" (not inside verbatim-like macro) components, i.e.,


oh, each full string guaranteed have "n" components, there may no "y" components. string may, in principle, start , end either "n" or "y" components.

i've been trying come solution uses lua's string library functions, haven't gotten far @ all. :-(

let's assume that:

  • macros names consist of letters , @
  • a delimiter may digit or punctuation character except @\

the code:

-- specify number of parameters every macro, -- use negative numbers macros supporting matching pair of curly braces {}  local all_macros = {    verb = 1,    url = -1,    lstinline = -1,    ["@some@macros"] = -2,    makeatletter = 0 }  -- list delimiters (only punctuation , digits) local all_delimiters = [[!"#$%&'*+,-./:;<=>?^_`|~()[]{}0123456789]]  -- specify function processing n-part of string local function convert(n_substring)    return n_substring:upper() end   -- processing local s = [[ fld u.a. ldfjal \verb{u.a.{ dlf \lstinline{u.a.} u.a. dfla \url{u.a.}rrr \@some@macros~u.a.~{u.a.}{u.a.}qq\verb|\lstinline+nested use+qqq|q ]] s = s:gsub("\\([%a@]+)",    function(macro_name)       if all_macros[macro_name]          return             "\1\\"..macro_name             ..(all_macros[macro_name] < 0 , "\2" or "\3")             :rep(math.abs(all_macros[macro_name]) + 1)       end    end ) repeat    local old_length = #s    repeat       local old_length = #s       s = s:gsub("\2(\2+)(%b{})", "%2%1")    until old_length == #s    s = s:gsub("[\2\3]([\2\3]+)((["..all_delimiters:gsub("%p", "%%%0").."])(.-)%3)", "%2%1") until old_length == #s s = ("\2"..s.."\1"):gsub("[\2\3]+([^\2\3]-)\1", convert):gsub("[\1\2\3]", "")  -- print result print(s) 


fld u.a. ldfjal \verb{u.a.{ dlf \lstinline{u.a.} u.a. dfla \url{u.a.}rrr \@some@macros~u.a.~{u.a.}{u.a.}qq\verb|\lstinline+nested use+qqq|q 


