Double Negation in PARSE is NOT NOT NOT Cool

hostilefork · March 3, 2021, 9:31pm

So there's a bit of an issue with the NOT combinator in UPARSE. It fails when the rule you give it as a parameter succeeds:

>> uparse [1] [not integer!]  ; fails... because it *was* an integer
; null

The NOT combinator received information about how much of the input the INTEGER! matched. But it threw that information away.

So what if you tried to negate it again...with NOT NOT?

>> uparse [1] [not not integer!]
; ???

Here the first NOT is negating the second rule that failed. So it's a success in the traditional sense you'd usually expect with a NOT. But it doesn't know in that success how much to skip. So the 1 is still pending.

Hence to get this to work you'd have to say:

>> uparse [1] [not not integer!, integer!]
== [1]

In other words, every NOT is effectively a NOT AHEAD.

But Red actually tries to cancel out the NOTs:

red>> parse [1] [set x not not integer!]
== true

red>> x
== 1

This doesn't make a lot of sense. It doesn't work with other failing rules... 1 is not a "not integer!" and it's also not a "tag!", so why would this not capture it?

red>> parse [1] [set x not tag!]
== false

Here are the options for dealing with this in the combinator model:

Change the name of NOT to NOT-AHEAD. This is how Haskell does it (they call it notFollowedBy... that combinator list is good to look at, by the way).
Leave it as-is and assume that it's the kind of thing you can just learn.
Rig up a hack to make the specific case of NOT immediately followed by NOT a no-op.
Systematically build into the protocol the idea that a rule can separately signal that it failed as well as "how much input it would have consumed if it had succeeded"

I think (3) as in Red is a bad idea. I'm not even sure what (4) would mean...?

red>> parse [1 2 3 4] [copy numbers not some not integer!]
== false

I think my inclination would be (2).

UPDATE: Looking a little further, I think the NOT NOT behavior in Red is likely a bug and not an attempt to implement a nonsensical feature...being caused by the same thing that made #1246 happen in R3-Alpha, e.g. controlling negation via a single flag.

UPDATE: Looking still further, Haskell Parsec has the same problem, e.g. (notFollowedBy . notFollowedBy != lookAhead)

`notFollowedBy` and parsers which don't consume input. · Issue #8 · haskell/parsec · GitHub

hostilefork · March 24, 2024, 1:09am

I was reading some code using NOT that was like some [not space, one].

Hence effectively this is some [not ahead space, one] e.g. some number of times look ahead to see a thing isn't a space, then consume one item (whatever the not space thing was).

This made me a bit disgruntled that I couldn't write some not space, which on the surface seems like a coherent thought... vs. something that acts like an some not ahead space e.g. an infinite loop.

In practice any input-consuming combinator has a bit of a coherence problem... because while the letter "A" is not a space, neither is reaching the end of input. So to stop the infinite loop problem you'd have to at the very least omit a test against end.

I'm not even sure what (4) would mean...?

Maybe it means we're missing a concept like negatable combinators, which you can call as parser/not instead of just parser. Then NOT would call the parser negatedly (maybe it's the only combinator that would do so, or maybe there would be others?)

e.g. AHEAD could be a negatable combinator, producing a negatable parser. And maybe a single character, single byte, or single value combinator would be willing to be negated--with the negated rule being any single value that's not that. We could also negate character sets in the same way.

So a string combinator when matching in a BLOCK! could be negated, and matching the end would not count:

>> parse ["alpha"] [not "beta"]
== "alpha"

>> parse [] [not "beta"]  ; behave same as e.g. `parse [] ["alpha"]`
** Error: PARSE BLOCK! combinator did not match input

But when matching in a string, it couldn't

>> parse "alpha" [not "beta"]
** Error: TEXT! combinator not negatable when parsing TEXT!

So there'd be a narrow interpretation of cases where it was clear how much a combinator intended to test and consume. If you could say how much it intended to consume... and make a reasoned argument it could consume an equal amount in the negated case. (Any combinator that consumed no input could thus be negatable.)

This strikes me as valuable, as it would give meaning to things like some not space that people would want, and dismiss with the confusion over NOT being implicitly NOT AHEAD.

It may mean that some not space would work, and some not " " would be different in string parsing, e.g. characters would be treated differently than strings of length 1. Or perhaps a NOT of a string that didn't match would always advance one character? :-/

Observation: NOT could itself be a negatable combinator, making NOT NOT work. It just would notice when it was called with /NOT and then use that as a signal to call its combinated parser parameter plainly, vs. using /NOT.

bradrn · March 24, 2024, 3:52am

I feel there’s some confusion between two completely different notions of negation here:

Complement of a character set: that is, ‘match any character which is not in this set’
Inversion of parser success: that is, ‘make a parser which fails when another succeeds’

In Haskell, the latter is (as you note) notFollowedBy, and the former needs to be created ad-hoc using combinators like satisfy. Ren-C could do a bit better than Haskell there, since it reifies character sets as values.

But especially the last quoted paragraph seems to be conflating them. It sounds like you want NOT to sometimes invert the failure state, and sometimes change the characters that are matched. That strikes me as possibly being quite difficult to reason about. Not only that, it would create difficulties in cases where both are plausible — e.g. if not space matches everything except spaces, then how do I create a parser which fails if a space is present?

hostilefork · March 24, 2024, 4:08am

You're more or less giving some parsers a refinement. So AHEAD has AHEAD/NOT. But it's letting people write that out as not ahead. And when combinators are triggered by datatype, you can get the feature of slipstreaming that refinement to the combinator when there's nowhere to put the /NOT.

A little touchy-feely perhaps that each parser can decide if there's a meaning to its negation (or if negation has no meaning at all) as opposed to something that has to be black-box decided from the parameterized parser's result. But it seems learnable to me.

The question of if this little backdoor communication breaks some other more interesting composability feature in UPARSE is the main one to answer. (It isn't the only such suggestion... I've wondered about a similar backchannel to suppress an error from the "null combinator", which a MAYBE combinator could uniquely suppress an error from.)

bradrn · March 24, 2024, 5:46am

But then there’s the other problem I mentioned: what if you do want to invert the parser’s failure state? For instance, in this parser I used notFollowedBy (symbol "//") to match only the case with a single slash. But if not "/" instead matches anything except for "/", then that leaves me no easy way to get that logic.

I think the cleanest design here would be to recognise that there’s two very different cases here, and therefore treat them differently:

The ‘fail on success’ logic can apply to any other combinator, so give it its own dedicated name, like NOT-FOLLOWED-BY or something. (That name’s verbose, but this combinator is uncommon, so it wouldn’t matter a huge amount.)
Other meanings of ‘negation’ are essentially parser-specific, so treat them as refinements for the individual parsers, like AHEAD/NOT. Characters can’t be refined, so we could make a new combinator for them (call it something like EXCEPT-FOR). I see no particular reason to merge all these things under a single NOT combinator.

hostilefork · March 24, 2024, 5:50am

I think that's covered well enough by NOT AHEAD.

AHEAD is a nice generic combinator for "take your combinated parser argument, apply it, but don't advance the input (or apply any 'pending' effects). just stop the match if it didn't succeed, or continue if it did."

So having NOT AHEAD be the implementation of NOT-FOLLOWED-BY feels good.

I like the brevity, and it feels "English-like" to me to be context-sensitive.

>> parse ["a"] [not ahead integer!, text!]
== "a"

>> parse ["a"] [not integer!]
== "a"

>> parse ["a"] [not some integer!]
** Error: SOME combinator cannot be negated with NOT

So my main concern is just the mechanics: does it break something interesting. (And as usual I'm suggesting it before actually trying it, which means I do not know if I will hit a contradiction while implementing it.)

hostilefork · March 26, 2024, 2:13am

One observation is that it probably breaks the ability to take any isolated rule and put it in a BLOCK!, since the block combinator isn't generically negatable.

So not [integer!] wouldn't work, when not integer! would.

I don't know that trying to special case it works. e.g. hacking it up to do not [integer! | word!]. That case may seem to work, but what if you mixed concepts of negation... like not [integer! | ahead word!] ?

It's sort of like I said: that NOT becomes a way to slipstream a refinement into a single combinator for its meaning of negation. You can't do that with a composite rule.

bradrn · March 26, 2024, 2:32am

I’m still not sure I see why NOT should exist as its own combinator. The disadvantages seem to outweigh the disadvantages: it adds a lot of complexity, and as you note there’s some cases where it doesn’t even seem to make sense.

Brett · March 31, 2024, 4:16am

In case it's of relevance or interest..

In Rebol 2 parse does not have a not, so parsing-unless is the function I use to implement the guard behaviour of not ahead. I use it a lot, like in this code to parse the adhoc syntax of my bank account transactions:

fxamt: [opt sp some digit opt [#"." some digit] sp cur-code] ; Yen is in whole numbers.
not-fxamt: parsing-unless [fxamt]
not-br: parsing-unless [{<BR>}]
nsp: complement charset { }

loc-word: [not-fxamt some [not-br nsp]]

So the nsp here is indeed a some not space in the positive sense. I use that a lot too but for my complex rules for me in Rebol 2 it will be some [not-thing skip]

I do use the two senses a lot in practice:

the negative/guard sense - I'm guarding the rest of the rule with the negated result of the match - the "unless".
the positive sense - I want the positive match of something that isn't the thing.

The wrinkle I've had with guard rules is when I'm using some routine that collects the input matched by rules that succeed - the guard rules succeed but I don't want their matching input (even if empty). For example, if I specified that I want to collect FXAMT then my routine will match when the fxamt inside not-fxamt succeeds. My simple workaround is to create another rule with the same structure but different name. I guess UPARSE wouldn't have that problem if an UNLESS took some of the role of NOT.

hostilefork · April 3, 2024, 7:56am

You keep your money in Yen?

I don't know that people would comprehend the difference between UNLESS vs. NOT.

If we're looking for uniform operators, it seems there are NOT-AHEAD and ONE-IF-NOT-AHEAD, where ONE-IF-NOT-AHEAD simply takes the next single item if the rule doesn't match:

>> parse ["a"] [one-if-not-ahead [integer! integer!]]
== "a"

This ONE-IF-NOT-AHEAD operator is kind of clunky, though similar to how SET works in Rebol2/Red, to take the first item of a match (vs UPARSE's last synthesized product of "a block combinator").

rebol2>> parse ["a" "b"] [set x [string! string!]]
== true

rebol2>> x
== "a"

one-if-not-ahead rule is simply [not-ahead rule, one] (in the current ONE-is-historical-SKIP model)

It would work with anything... but I'd personally rather have a NOT that wasn't compatible with everything, but covered common cases. And if someone actually wanted ONE-IF-NOT-AHEAD they could write [not ahead rule, one] for themselves.

It wouldn't really be a combinator, but a "combinator modifier".

Let me reiterate it's simply about being able to pass a refinement to combinators, even if they are triggered by a datatype and hence nowhere to put the refinement.

e.g. you can write ahead/not but you can't write #a/not

I think not *combinator* being able to do this modification is a cool idea.

Though of course allowing general "combinator modifiers" creates a lot of obfuscations, due to the backchannel. If I'm trying to make a debugger, then these refinements slipping in have to be presented somehow... otherwise once you've stepped into the not and all you see is #a it will be confusing when #b looks to be a match of #a.

Maybe negation is the only combinator modifier, so the system can account for it.

Conservative Choice For Now

A conservative thing to do may just be to rename NOT as NOT-AHEAD and work on higher priority tasks.

hostilefork · August 26, 2024, 8:59am

While NOT-AHEAD and NOT AHEAD are close-enough to each other, it looks pretty ugly to have to write not-ahead <end> instead of not <end>, which is pretty common.

Hence I think the conservative choice should include a new TAG! combinator: <not end>

I may even prefer it to not <end>

Punting The Implementation Code

My experiment just sniffed for a refinement on the combinator:

negatable-parser?: func [
    return: [logic?]
    frame [<unrun> frame!]
][
    return did find words of frame 'negated
]

(There should be better ways of doing that, but at least there is a way of doing it.)

NOT Combinator

; Historical Redbol PARSE considered NOT a synonym for NOT AHEAD.  But
; the concept behind this NOT is that some parsers can be "negated",
; which allows them to actually advance and consume input:
;
;     >> parse ["a"] [not ahead integer!, text!]
;     == "a"
;
;     >> parse ["a"] [not integer!]
;     == "a"
;
;     >> parse ["a"] [not some integer!]
;     ** Error: SOME combinator cannot be negated with NOT
;
; The approach has some weaknesses, e.g. the BLOCK! combinator isn't
; negatable so `not [integer!]` isn't legal but `not integer!` is.  But
; the usefulness is high enough that it's believed worth it.

'not combinator [
    {If the parser argument is negatable, invoke it in the negated sense}
    return: [any-value? pack?]
    parser [action?]
    /negated
][
    if negated [  ; NOT NOT, e.g. call parser without negating it
        return [@ remainder]: parser input except e -> [
            return raise e
        ]
    ]
    if not negatable-parser? :parser [
        fail "NOT called on non-negatable combinator"
    ]
    return [@ remainder]: parser/negated input except e -> [
        return raise e
    ]
]

Being able to do NOT NOT might seem useless... but... consider generated code being composed together. There may be value in having it work for somebody, somewhere.

AHEAD Combinator

This one is very straightforward.

'ahead combinator [
    {Leave the parse position at the same location, but fail if no match}
    return: "parser result if success, NULL if failure"
        [any-value? pack?]
    parser [action?]
    /negated
][
    remainder: input  ; never advances
    if negated [
        parser input except e -> [
            return ~not~
        ]
        return raise "Negated parser passed to AHEAD succeded"
    ]
    return parser input  ; don't care about what parser's remainder is
]

TYPE-BLOCK! Combinator

One of the benefits of UTF-8 Everywhere is that Ren-C's internal string representation can feed right into the source scanner. So if your input is a string or binary and you try to parse a datatype out of it, UPARSE will just run it:

>> parse "10 [20 <thirty>]" [x: integer! y: block!]
== [20 <thirty>]

>> x
== 10

>> y
== [20 <thirty>]

Very cool, but of course that's not negatable...only matching an element of a datatype in an array can be negated. So here we see a "sometimes-negatable" combinator:

type-block! combinator [
    return: "Matched or synthesized value"
        [element?]
    value [type-block!]
    /negated
    <local> item error
][
    either any-array? input [
        if value <> type of maybe input.1 [
            if negated [
                remainder: next input
                return input.1
            ]
            return raise "Value at parse position did not match TYPE-BLOCK!"
        ]
        if negated [
            return raise "Value at parse position matched TYPE-BLOCK!"
        ]
        remainder: next input
        return input.1
    ][
        if negated [
            fail "TYPE-BLOCK! only supported negated for array input"
        ]
        [item remainder]: transcode/one input except e -> [return raise e]

        ; If TRANSCODE knew what kind of item we were looking for, it could

        ; has some type sniffing in their fast lexer, review relevance.
        ;
        match value item else [
            return raise "Could not TRANSCODE the TYPE-GROUP! from input"
        ]
        return item
    ]
]

`<end>` Combinator

This looks simple, but wait until the next one:

<end> combinator [
    {Only match if the input is at the end}
    return: "Invisible"
        [nihil?]
    /negated
][
    remainder: input  ; never advances
    if tail? input [
        if negated [
            return raise "PARSE position at <end> (but parser negated)"
        ]
        return nihil
    ]
    if negated [
        return nihil
    ]
    return raise "PARSE position not at <end>"
]

TAG! Combinator

Here's where a big question came up.

Each parse can use its own choice of a combinator MAP!. And you can put instances of a datatype into the map (such as a combinator for <end>).

But you can also put in a combinator for the datatype itself, e.g. &[tag].

So who gets the first shot? The specific instance or the general combinator?

I had thought it would be more interesting if the datatype got the first chance, so that you could put a meaning of all the combinators of that type. But trying to implement this gives me doubts... and maybe if your instance of the combinator wants to be part of a family controlled by the datatype, that's something you do by making a TAG-COMBINATOR generator that does the generality.

Certainly having the datatype get first chance creates a bit of an annoyance because the TAG! combinator now becomes one of these "am I negatable? uh, have to ask who I'm delegating to..." situations.

So if we don't dispatch directly to <end> we have to tunnel this /NEGATED switch.

tag! combinator [
    {Special noun-like keyword subdispatcher for TAG!s}
    return: "What the delegated-to tag returned"
        [any-value? pack?]
    @pending [blank! block!]
    value [tag!]
    /negated
    <local> comb
][
    if not comb: state.combinators.(value) [
        fail ["No TAG! Combinator registered for" value]
    ]
    if negated [
        if not negatable-parser? comb [
            fail "NOT called on non-negatable combinator"
        ]
        comb: runs comb
        return [@ remainder pending]: comb/negated state input
    ]

    return [@ remainder pending]: run comb state input
]

So this kind of spoiled my "hey, only the combinators that care have to worry about it!" idea, it winds up being a tax on everything.

I Think Maybe My Datatype-Gets-First-Choice Idea Is Wrong

I've seen it going wrong other places. It's annoying that I can't put in a behavior like:

~true~ => combinator [...] [<<always succeed>>]
~false~ => combinator [...] [<<always fail>>]

And instead I have to go through the quasiform combinator through an extra step.

(Sidenote: Because MAP! can't store antiforms, I think the idea that quasiforms literally appearing in PARSE must do the same thing that their antiforms would do if looked up via a WORD!. There are ways to juggle things so you don't have to have that rule, but it seems pretty reasonable.)

If you have an idea about a combinator that takes control of everything, and are annoyed by there being combinators for instances that override your plan... why did you put them in the combinator map that UPARSE sees instead of some other dispatch table?

Anyway, switching to the instance getting first choice would help here. I'll come back to this, and just do the NOT-AHEAD and <not end> for now.