Double Negation in PARSE is NOT NOT NOT Cool

So there's a bit of an issue with the NOT combinator in UPARSE. It fails when the rule you give it as a parameter succeeds:

>> uparse [1] [not integer!]  ; fails... because it *was* an integer
; null

The NOT combinator received information about how much of the input the INTEGER! matched. But it threw that information away.

So what if you tried to negate it again...with NOT NOT?

>> uparse [1] [not not integer!]
; ???

Here the first NOT is negating the second rule that failed. So it's a success in the traditional sense you'd usually expect with a NOT. But it doesn't know in that success how much to skip. So the 1 is still pending.

Hence to get this to work you'd have to say:

>> uparse [1] [not not integer!, integer!]
== [1]

In other words, every NOT is effectively a NOT AHEAD.

But Red actually tries to cancel out the NOTs:

red>> parse [1] [set x not not integer!]
== true

red>> x
== 1

This doesn't make a lot of sense. It doesn't work with other failing rules... 1 is not a "not integer!" and it's also not a "tag!", so why would this not capture it?

red>> parse [1] [set x not tag!]
== false

Here are the options for dealing with this in the combinator model:

  1. Change the name of NOT to NOT-AHEAD. This is how Haskell does it (they call it notFollowedBy... that combinator list is good to look at, by the way).

  2. Leave it as-is and assume that it's the kind of thing you can just learn.

  3. Rig up a hack to make the specific case of NOT immediately followed by NOT a no-op.

  4. Systematically build into the protocol the idea that a rule can separately signal that it failed as well as "how much input it would have consumed if it had succeeded"

I think (3) as in Red is a bad idea. I'm not even sure what (4) would mean...?

red>> parse [1 2 3 4] [copy numbers not some not integer!]
== false

I think my inclination would be (2).


UPDATE: Looking a little further, I think the NOT NOT behavior in Red is likely a bug and not an attempt to implement a nonsensical feature...being caused by the same thing that made #1246 happen in R3-Alpha, e.g. controlling negation via a single flag.

I was reading some code using NOT that was like some [not space, one].

Hence effectively this is some [not ahead space, one] e.g. some number of times look ahead to see a thing isn't a space, then consume one item (whatever the not space thing was).

This made me a bit disgruntled that I couldn't write some not space, which on the surface seems like a coherent thought... vs. something that acts like an some not ahead space e.g. an infinite loop.

In practice any input-consuming combinator has a bit of a coherence problem... because while the letter "A" is not a space, neither is reaching the end of input. So to stop the infinite loop problem you'd have to at the very least omit a test against end.

I'm not even sure what (4) would mean...?

Maybe it means we're missing a concept like negatable combinators, which you can call as parser/not instead of just parser. Then NOT would call the parser negatedly (maybe it's the only combinator that would do so, or maybe there would be others?)

e.g. AHEAD could be a negatable combinator, producing a negatable parser. And maybe a single character, single byte, or single value combinator would be willing to be negated--with the negated rule being any single value that's not that. We could also negate character sets in the same way.

So a string combinator when matching in a BLOCK! could be negated, and matching the end would not count:

>> parse ["alpha"] [not "beta"]
== "alpha"

>> parse [] [not "beta"]  ; behave same as e.g. `parse [] ["alpha"]`
** Error: PARSE BLOCK! combinator did not match input

But when matching in a string, it couldn't

>> parse "alpha" [not "beta"]
** Error: TEXT! combinator not negatable when parsing TEXT!

So there'd be a narrow interpretation of cases where it was clear how much a combinator intended to test and consume. If you could say how much it intended to consume... and make a reasoned argument it could consume an equal amount in the negated case. (Any combinator that consumed no input could thus be negatable.)

This strikes me as valuable, as it would give meaning to things like some not space that people would want, and dismiss with the confusion over NOT being implicitly NOT AHEAD.

It may mean that some not space would work, and some not " " would be different in string parsing, e.g. characters would be treated differently than strings of length 1. Or perhaps a NOT of a string that didn't match would always advance one character? :-/


Observation: NOT could itself be a negatable combinator, making NOT NOT work. It just would notice when it was called with /NOT and then use that as a signal to call its combinated parser parameter plainly, vs. using /NOT.

I feel there’s some confusion between two completely different notions of negation here:

  • Complement of a character set: that is, ‘match any character which is not in this set’
  • Inversion of parser success: that is, ‘make a parser which fails when another succeeds’

In Haskell, the latter is (as you note) notFollowedBy, and the former needs to be created ad-hoc using combinators like satisfy. Ren-C could do a bit better than Haskell there, since it reifies character sets as values.

But especially the last quoted paragraph seems to be conflating them. It sounds like you want NOT to sometimes invert the failure state, and sometimes change the characters that are matched. That strikes me as possibly being quite difficult to reason about. Not only that, it would create difficulties in cases where both are plausible — e.g. if not space matches everything except spaces, then how do I create a parser which fails if a space is present?

You're more or less giving some parsers a refinement. So AHEAD has AHEAD/NOT. But it's letting people write that out as not ahead. And when combinators are triggered by datatype, you can get the feature of slipstreaming that refinement to the combinator when there's nowhere to put the /NOT.

A little touchy-feely perhaps that each parser can decide if there's a meaning to its negation (or if negation has no meaning at all) as opposed to something that has to be black-box decided from the parameterized parser's result. But it seems learnable to me.

The question of if this little backdoor communication breaks some other more interesting composability feature in UPARSE is the main one to answer. (It isn't the only such suggestion... I've wondered about a similar backchannel to suppress an error from the "null combinator", which a MAYBE combinator could uniquely suppress an error from.)

But then there’s the other problem I mentioned: what if you do want to invert the parser’s failure state? For instance, in this parser I used notFollowedBy (symbol "//") to match only the case with a single slash. But if not "/" instead matches anything except for "/", then that leaves me no easy way to get that logic.

I think the cleanest design here would be to recognise that there’s two very different cases here, and therefore treat them differently:

  • The ‘fail on success’ logic can apply to any other combinator, so give it its own dedicated name, like NOT-FOLLOWED-BY or something. (That name’s verbose, but this combinator is uncommon, so it wouldn’t matter a huge amount.)
  • Other meanings of ‘negation’ are essentially parser-specific, so treat them as refinements for the individual parsers, like AHEAD/NOT. Characters can’t be refined, so we could make a new combinator for them (call it something like EXCEPT-FOR). I see no particular reason to merge all these things under a single NOT combinator.

I think that's covered well enough by NOT AHEAD.

AHEAD is a nice generic combinator for "take your combinated parser argument, apply it, but don't advance the input (or apply any 'pending' effects). just stop the match if it didn't succeed, or continue if it did."

So having NOT AHEAD be the implementation of NOT-FOLLOWED-BY feels good.

I like the brevity, and it feels "English-like" to me to be context-sensitive.

>> parse ["a"] [not ahead integer!, text!]
== "a"

>> parse ["a"] [not integer!]
== "a"

>> parse ["a"] [not some integer!]
** Error: SOME combinator cannot be negated with NOT

So my main concern is just the mechanics: does it break something interesting. (And as usual I'm suggesting it before actually trying it, which means I do not know if I will hit a contradiction while implementing it.)

One observation is that it probably breaks the ability to take any isolated rule and put it in a BLOCK!, since the block combinator isn't generically negatable.

So not [integer!] wouldn't work, when not integer! would.

I don't know that trying to special case it works. e.g. hacking it up to do not [integer! | word!]. That case may seem to work, but what if you mixed concepts of negation... like not [integer! | ahead word!] ?

It's sort of like I said: that NOT becomes a way to slipstream a refinement into a single combinator for its meaning of negation. You can't do that with a composite rule.

I’m still not sure I see why NOT should exist as its own combinator. The disadvantages seem to outweigh the disadvantages: it adds a lot of complexity, and as you note there’s some cases where it doesn’t even seem to make sense.

In case it's of relevance or interest..

In Rebol 2 parse does not have a not, so parsing-unless is the function I use to implement the guard behaviour of not ahead. I use it a lot, like in this code to parse the adhoc syntax of my bank account transactions:

fxamt: [opt sp some digit opt [#"." some digit] sp cur-code] ; Yen is in whole numbers.
not-fxamt: parsing-unless [fxamt]
not-br: parsing-unless [{<BR>}]
nsp: complement charset { }

loc-word: [not-fxamt some [not-br nsp]]

So the nsp here is indeed a some not space in the positive sense. I use that a lot too but for my complex rules for me in Rebol 2 it will be some [not-thing skip]

I do use the two senses a lot in practice:

  • the negative/guard sense - I'm guarding the rest of the rule with the negated result of the match - the "unless".
  • the positive sense - I want the positive match of something that isn't the thing.

The wrinkle I've had with guard rules is when I'm using some routine that collects the input matched by rules that succeed - the guard rules succeed but I don't want their matching input (even if empty). For example, if I specified that I want to collect FXAMT then my routine will match when the fxamt inside not-fxamt succeeds. My simple workaround is to create another rule with the same structure but different name. I guess UPARSE wouldn't have that problem if an UNLESS took some of the role of NOT.

2 Likes

You keep your money in Yen? :slight_smile:

I don't know that people would comprehend the difference between UNLESS vs. NOT.

If we're looking for uniform operators, it seems there are NOT-AHEAD and ONE-IF-NOT-AHEAD, where ONE-IF-NOT-AHEAD simply takes the next single item if the rule doesn't match:

>> parse ["a"] [one-if-not-ahead [integer! integer!]]
== "a"

This ONE-IF-NOT-AHEAD operator is kind of clunky, though similar to how SET works in Rebol2/Red, to take the first item of a match (vs UPARSE's last synthesized product of "a block combinator").

rebol2>> parse ["a" "b"] [set x [string! string!]]
== true

rebol2>> x
== "a"

one-if-not-ahead rule is simply [not-ahead rule, one] (in the current ONE-is-historical-SKIP model)

It would work with anything... but I'd personally rather have a NOT that wasn't compatible with everything, but covered common cases. And if someone actually wanted ONE-IF-NOT-AHEAD they could write [not ahead rule, one] for themselves.

It wouldn't really be a combinator, but a "combinator modifier".

Let me reiterate it's simply about being able to pass a refinement to combinators, even if they are triggered by a datatype and hence nowhere to put the refinement.

e.g. you can write ahead/not but you can't write #a/not

I think not *combinator* being able to do this modification is a cool idea.

Though of course allowing general "combinator modifiers" creates a lot of obfuscations, due to the backchannel. If I'm trying to make a debugger, then these refinements slipping in have to be presented somehow... otherwise once you've stepped into the not and all you see is #a it will be confusing when #b looks to be a match of #a.

Maybe negation is the only combinator modifier, so the system can account for it.

Conservative Choice For Now

A conservative thing to do may just be to rename NOT as NOT-AHEAD and work on higher priority tasks.