I wanted to make a REWORD variation that would look for escaped parts of strings and extract them as words. So:
Input: "abc$(def)ghi"
Output: ["abc" def "ghi"]
It's a common-seeming and not entirely trivial task. The first thing I came up with is a bit convoluted...perhaps because I tried to not repeat the "$(" and ")" strings in the rule:
uparse text [
return collect [
any [
not end
(capturing: false)
keep opt between here ["$(" (capturing: true) | end]
:(if capturing '[
inner: between here ")"
keep @(as word! inner)
])
]
]
]
It basically alternates between a capturing mode and a non-capturing mode. It decides if it needs to run a capture mode with a variable.
It has to throw in a NOT END for reasons I explain in another post. Because it's running alternating rules that may both opt out.
I use a GET-GROUP! spliced conditional rule, as UPARSE doesn't have any loop-interrupting constructs yet. So you can't say "Stop running this rule, but consider it to have matched." There's only LOGIC! of #[false] which means what FAIL used to mean...e.g. the overall rule did not match (so any collected material would be forgotten).
Since it can't break out of the rule and report success, it has to have a way to skip over a rule. So the rule for capturing inside the parentheses conditions itself out with an IF statement and a generated rule. I could have instead written that as an alternate rule, where if not capturing
was true it would bypass normal code:
uparse text [
return collect [
any [
not end
(capturing: false)
keep opt between here ["$(" (capturing: true) | end]
[:(not capturing) |
inner: between here ")"
keep @(as word! inner)
]
]
]
]
That feels more convoluted to me because of the inverse logic of the NOT, though.
It produces more empty strings than I would like:
Input: "$(abc)$(def)$(ghi)"
Output: ["" abc "" def "" ghi]
It would technically be possible for a rule like BETWEEN to succeed and give a NULL result if there were no content, instead of an empty string:
>> did parse "()" [x: between "(" ")"]
== #[true]
>> x
; null
But this then means you can't get a good distinction of what happened in the case of an optional rule.
>> did parse "" [x: opt between "(" ")"]
== #[true]
>> x
; null...so were there parentheses or not?
So I guess it's another situation where if you want to filter out the empty strings, you have to capture into a variable and filter it.
I think UPARSE helps out here...but it's not quite the slam dunk I'd hope for.
Because it has two rules that may both opt themselves out, it's a thought piece for asking if the NOT END makes sense with ANY. Or is it better off baking that into the ANY rule and having another construct? Intuitively I feel like the tax of having two slightly different versions and explaining the use of one vs. the other is worse than just having the more general construct.
If there were a loop-ending construct that indicated the overall rule was a success (e.g. didn't discard the KEEPs), then we might avoid the capturing flag:
uparse text [
return collect [
any [
keep opt between here ["$(" | end]
[end break |
inner: between here ")"
keep @(as word! inner)
]
]
]
]
But I don't know if BREAK is the right name for a loop-accepting operation (as in DO's while this typically causes most loop operations to return NULL). So I'd think it would perhaps discard anything kept. Perhaps STOP would be more consistent, and it could be value-bearing as well (stop @(...))
Loop breaking operations would be another potential argument for renaming ANY to WHILE, as it would hint more clearly at the existence of a BREAK. Though WHILE has no STOP in plain DO currently (only CYCLE does) due to the desire to make loops easier to abstract.