So INTO had a proposal on the table for R3-Alpha to become arity-2, and take a datatype. This simplifies a common pattern of wanting to say what you're parsing into:
ahead text! into [some "a"] ; arity-1 form
=>
into text! [some "a"] ; arity-2 form
Neither R3-Alpha nor Red went with this...but Topaz did. (Perhaps it was Gabriele who made the proposal in the first place?)
On the surface it seems like it has some pros and some cons, but mostly equivalent. But since UPARSE has value-bearing rules, we can take this one step further... instead of limiting the first argument to a datatype!, we can make it any rule that bears a series as its result!
Datatype counts for that, but there's more...
Parsing INTO a Generated Series
This means you're not just restricted to going INTO a series that existed at the start of the parse. You can parse into products.
For example:
uparse "((aaaa)))" [into [between some "(" some ")"] [some "a"]]
The first rule is a BETWEEN which is somewhat like traditional COPY in that it generates a new series....in this case a series that doesn't have the leading or trailing parentheses.
If your first rule is complex, this can introduce a somewhat long separation between the parameters. But a general tool exists now for addressing that, with the SYM-XXX! substitutions:
Here's why that's important:
uparse [| | any any any | | |] [
content: between some '| some '|
into @content [some 'any]
]
The BETWEEN captured a block as [any any any]. You don't want the INTO to be using that as a rule, you're trying to use it as-is.
The Current State of SYM-XXX! In Parse
When I first framed the idea of SYM-WORD! I was thinking of it for matching against a literal item:
data: [not a rule]
parse [[not a rule]] [@data] ; the previous idea: match input against literal
But what it's acting like now is not for matching input. Instead, it is a value-bearing rule that consumes no input.
So it's like this:
>> uparse "aaa" [x: @("not for match"), some "a"]
== "aaa"
>> x
== "not for match"
I threw in a twist, which is that the @ rule will fail if it gets NULL
>> x: <before>
>> uparse "aaa" [x: @(if false ["not for match"]), some "a"]
; null
>> x
== <before>
This still gives you the power to take the null and keep going. Just combine it with the OPT rule!
>> x: <before>
>> uparse "aaa" [x: opt @(if false ["not for match"]), some "a"]
== "aaa"
>> x
; null
That makes for good cooperation with KEEP, when you find that you want to keep some calculated material, and you can frame your rule to opt out. So check this out, @rgchris:
>> uparse "aaa" [x: collect [some [
keep opt @(if false [<not kept>])
keep skip ; or whatever we wind up calling "consume next series item"
keep @(if true [<kept>])
]]]
>> x
== [#a <kept> #a <kept> #a <kept>]
Contrast of @(...)
and (...)
The (...) form and @(...) form are similar in the sense that they do not advance the input...or look at the input at all. But (...) is not "value-bearing"
>> uparse "" [x: (1 + 2)]
** Error: UPARSE can't use SET-WORD! with non-value bearing rule (1 + 2)
We don't technically need to distinguish the @(...) and (...) forms, but I think there are good reasons to do so. I think the "fail the rule if null" is a rather neat twist--and you wouldn't want that for a rule that wasn't intended to have its result used (it might incidentally return null, e.g. be an IF whose branch didn't run). But the most obvious justification for differentiation is helping guide the user reading the rules to know what they are looking at.
A value-bearing group risks contaminating aggregate captures:
x: [integer! (...) | text! (...)]
It would be awkward if you had to explicitly disavow those groups with ELIDE (which may take over the name SKIP):
x: [integer! elide (...) | text! elide (...)]
It's nice to get a heads up when you are getting things wrong, when you think you're producing a used value but it's getting discarded:
>> uparse "aaa" [some "a", @(if true [<what?>])]
** Error: Result of @(...) rule not consumed, use (...) for non-value-bearing
Plus, there's a natural correlation between the @word
and @pa/th
forms. Seeing all these as a family instead of having (...) be the odd duck is helpful.
I'm on the fence as to whether it's worth "wasting" @[bl oc k]
as a synonym for @([bl oc k])
. It has the nice property of not looking at the input like other things in the family, and might help KEEP in particular be lighter in adding material to its collection.
Contrast with :(...)
The :(...) form means "use the product of this group as a match rule". That goes along with :word which I am suggesting means "fetch word and use as a rule, just like with ordinary words, but with the exception that it means override any keyword with the same name."
Because this is something like COMPOSE-on-the-fly, the case of returning NULL doesn't fail here. It just splices no rule. You have the option of making a failing rule by evaluating to #[false]...so there's still an option, and that's a popular one, e.g. :(mode = 'some-state)
This overrides the need for IF, or similar constructs. It leaves GET-BLOCK! up for grabs as what it would mean, since a BLOCK! is already interpreted as a rule...so we can keep thinking about that.
How To Achieve Meaning of "Match Literally"?
We have what appears to be a bit of a hole here, on how we are supposed to match an item of input literally. Let's go back to the example:
data: [not a rule]
parse [[not a rule]] [??? data] ; how to do this match?
Right now we have at least one option: generate a rule that adds a quote level.
data: [not a rule]
parse [[not a rule]] [:(quote data)]
That acts as if we said:
parse [[not a rule]] ['[not a rule]]
Using :(quote ...) is not super ideal, so the pattern might need a keyword. But the keyword would need to cooperate with the @ form, because literally data would have data turned into a block combinator by the parse engine before LITERALLY saw it. You'd have to say literally @data
to get the block itself passed.