looking for a Lua-based solution for splitting a string into two or more components -
this first posting site, please bear me.
consider following, representative string:
fld u.a. ldfjal \verb*u.a.* dlf \lstinline$u.a.$ u.a. dfla \url{u.a.}rrr
for background: \verb*....*
, \lstline$...$
latex macros arguments aren't delimited matching curly braces but, instead, common character: *
in case of \verb
, , $
in case of \lstinline
. important point delimiter characters can printable ascii character except {
, }
; 1 should not assume *
or $
used delimiters in (or any) cases. separately, \url{...}
latex macro argument delimited curly braces. full string should assume contain utf8-encoded characters; simplicity, let's assume they're pure ascii characters.
i'm looking create (hopefully reasonably efficient...) lua-based way split full string 2 sets of substrings: (a) parts consist of latex macros , associated arguments , (b) other parts. eventual goal feed "other parts" string.gsub function call.
turning preceding example, how might 1 separate string
fld u.a. ldfjal \verb*u.a.* dlf \lstinline$u.a.$ u.a. dfla \url{u.a.}rrr
into "y" (inside verbatim-like macro) , "n" (not inside verbatim-like macro) components, i.e.,
nnnnnnnnnnnnnnnnyyyyyyyyyyynnnnnyyyyyyyyyyyyyyyynnnnnnnnnnnyyyyyyyyyynnn
oh, each full string guaranteed have "n" components, there may no "y" components. string may, in principle, start , end either "n" or "y" components.
i've been trying come solution uses lua's string library functions, haven't gotten far @ all. :-(
let's assume that:
- macros names consist of letters ,
@
- a delimiter may digit or punctuation character except
@\
the code:
-- specify number of parameters every macro, -- use negative numbers macros supporting matching pair of curly braces {} local all_macros = { verb = 1, url = -1, lstinline = -1, ["@some@macros"] = -2, makeatletter = 0 } -- list delimiters (only punctuation , digits) local all_delimiters = [[!"#$%&'*+,-./:;<=>?^_`|~()[]{}0123456789]] -- specify function processing n-part of string local function convert(n_substring) return n_substring:upper() end -- processing local s = [[ fld u.a. ldfjal \verb{u.a.{ dlf \lstinline{u.a.} u.a. dfla \url{u.a.}rrr \@some@macros~u.a.~{u.a.}{u.a.}qq\verb|\lstinline+nested use+qqq|q ]] s = s:gsub("\\([%a@]+)", function(macro_name) if all_macros[macro_name] return "\1\\"..macro_name ..(all_macros[macro_name] < 0 , "\2" or "\3") :rep(math.abs(all_macros[macro_name]) + 1) end end ) repeat local old_length = #s repeat local old_length = #s s = s:gsub("\2(\2+)(%b{})", "%2%1") until old_length == #s s = s:gsub("[\2\3]([\2\3]+)((["..all_delimiters:gsub("%p", "%%%0").."])(.-)%3)", "%2%1") until old_length == #s s = ("\2"..s.."\1"):gsub("[\2\3]+([^\2\3]-)\1", convert):gsub("[\1\2\3]", "") -- print result print(s)
output:
fld u.a. ldfjal \verb{u.a.{ dlf \lstinline{u.a.} u.a. dfla \url{u.a.}rrr \@some@macros~u.a.~{u.a.}{u.a.}qq\verb|\lstinline+nested use+qqq|q
Comments
Post a Comment