While NOT-AHEAD and NOT AHEAD are close-enough to each other, it looks pretty ugly to have to write not-ahead <end>
instead of not <end>
, which is pretty common.
Hence I think the conservative choice should include a new TAG! combinator: <not end>
I may even prefer it to not <end>
Punting The Implementation Code
My experiment just sniffed for a refinement on the combinator:
negatable-parser?: func [
return: [logic?]
frame [<unrun> frame!]
][
return did find words of frame 'negated
]
(There should be better ways of doing that, but at least there is a way of doing it.)
NOT Combinator
; Historical Redbol PARSE considered NOT a synonym for NOT AHEAD. But
; the concept behind this NOT is that some parsers can be "negated",
; which allows them to actually advance and consume input:
;
; >> parse ["a"] [not ahead integer!, text!]
; == "a"
;
; >> parse ["a"] [not integer!]
; == "a"
;
; >> parse ["a"] [not some integer!]
; ** Error: SOME combinator cannot be negated with NOT
;
; The approach has some weaknesses, e.g. the BLOCK! combinator isn't
; negatable so `not [integer!]` isn't legal but `not integer!` is. But
; the usefulness is high enough that it's believed worth it.
'not combinator [
{If the parser argument is negatable, invoke it in the negated sense}
return: [any-value? pack?]
parser [action?]
/negated
][
if negated [ ; NOT NOT, e.g. call parser without negating it
return [@ remainder]: parser input except e -> [
return raise e
]
]
if not negatable-parser? :parser [
fail "NOT called on non-negatable combinator"
]
return [@ remainder]: parser/negated input except e -> [
return raise e
]
]
Being able to do NOT NOT might seem useless... but... consider generated code being composed together. There may be value in having it work for somebody, somewhere.
AHEAD Combinator
This one is very straightforward.
'ahead combinator [
{Leave the parse position at the same location, but fail if no match}
return: "parser result if success, NULL if failure"
[any-value? pack?]
parser [action?]
/negated
][
remainder: input ; never advances
if negated [
parser input except e -> [
return ~not~
]
return raise "Negated parser passed to AHEAD succeded"
]
return parser input ; don't care about what parser's remainder is
]
TYPE-BLOCK! Combinator
One of the benefits of UTF-8 Everywhere is that Ren-C's internal string representation can feed right into the source scanner. So if your input is a string or binary and you try to parse a datatype out of it, UPARSE will just run it:
>> parse "10 [20 <thirty>]" [x: integer! y: block!]
== [20 <thirty>]
>> x
== 10
>> y
== [20 <thirty>]
Very cool, but of course that's not negatable...only matching an element of a datatype in an array can be negated. So here we see a "sometimes-negatable" combinator:
type-block! combinator [
return: "Matched or synthesized value"
[element?]
value [type-block!]
/negated
<local> item error
][
either any-array? input [
if value <> type of maybe input.1 [
if negated [
remainder: next input
return input.1
]
return raise "Value at parse position did not match TYPE-BLOCK!"
]
if negated [
return raise "Value at parse position matched TYPE-BLOCK!"
]
remainder: next input
return input.1
][
if negated [
fail "TYPE-BLOCK! only supported negated for array input"
]
[item remainder]: transcode/one input except e -> [return raise e]
; If TRANSCODE knew what kind of item we were looking for, it could
; has some type sniffing in their fast lexer, review relevance.
;
match value item else [
return raise "Could not TRANSCODE the TYPE-GROUP! from input"
]
return item
]
]
<end>
Combinator
This looks simple, but wait until the next one:
<end> combinator [
{Only match if the input is at the end}
return: "Invisible"
[nihil?]
/negated
][
remainder: input ; never advances
if tail? input [
if negated [
return raise "PARSE position at <end> (but parser negated)"
]
return nihil
]
if negated [
return nihil
]
return raise "PARSE position not at <end>"
]
TAG! Combinator
Here's where a big question came up.
Each parse can use its own choice of a combinator MAP!. And you can put instances of a datatype into the map (such as a combinator for <end>
).
But you can also put in a combinator for the datatype itself, e.g. &[tag]
.
So who gets the first shot? The specific instance or the general combinator?
I had thought it would be more interesting if the datatype got the first chance, so that you could put a meaning of all the combinators of that type. But trying to implement this gives me doubts... and maybe if your instance of the combinator wants to be part of a family controlled by the datatype, that's something you do by making a TAG-COMBINATOR generator that does the generality.
Certainly having the datatype get first chance creates a bit of an annoyance because the TAG! combinator now becomes one of these "am I negatable? uh, have to ask who I'm delegating to..." situations.
So if we don't dispatch directly to <end>
we have to tunnel this /NEGATED switch.
tag! combinator [
{Special noun-like keyword subdispatcher for TAG!s}
return: "What the delegated-to tag returned"
[any-value? pack?]
@pending [blank! block!]
value [tag!]
/negated
<local> comb
][
if not comb: state.combinators.(value) [
fail ["No TAG! Combinator registered for" value]
]
if negated [
if not negatable-parser? comb [
fail "NOT called on non-negatable combinator"
]
comb: runs comb
return [@ remainder pending]: comb/negated state input
]
return [@ remainder pending]: run comb state input
]
So this kind of spoiled my "hey, only the combinators that care have to worry about it!" idea, it winds up being a tax on everything.
I Think Maybe My Datatype-Gets-First-Choice Idea Is Wrong
I've seen it going wrong other places. It's annoying that I can't put in a behavior like:
~true~ => combinator [...] [<<always succeed>>]
~false~ => combinator [...] [<<always fail>>]
And instead I have to go through the quasiform combinator through an extra step.
(Sidenote: Because MAP! can't store antiforms, I think the idea that quasiforms literally appearing in PARSE must do the same thing that their antiforms would do if looked up via a WORD!. There are ways to juggle things so you don't have to have that rule, but it seems pretty reasonable.)
If you have an idea about a combinator that takes control of everything, and are annoyed by there being combinators for instances that override your plan... why did you put them in the combinator map that UPARSE sees instead of some other dispatch table?
Anyway, switching to the instance getting first choice would help here. I'll come back to this, and just do the NOT-AHEAD
and <not end>
for now.