Separating Parse rules across contexts

parse
context

#1

Am going to flesh out an example here to illustrate my problem.

I have a mini-JSON string parser that offloads escapes parsing to a separate context.

escapes: use [chars][
    chars: make map! [
        newline "^/"
        tab "^-"
    ]

    [
        #"\" [
              #"n" (chars/newline)
            | #"t" (chars/tab)
        ]
    ]
]

I don’t want to copy any strings here or change any of the source content, just acknowledge the switch and pass the results back to the calling rule. Which would look like this:

string-rule: use [chars][
    chars: complement charset "\"

    [
        any [
              copy part some chars (emit part)
            | escapes (emit escapes-product)
            | skip (emit "\")
        ]
    ]
]

(ignore the mechanics of emit for the moment)

The only way I can see to pass escapes-product is to explicitly create a variable in a shared context outside the two rule contexts. This seems unwieldy—perhaps there are other ways?

Catch/Throw

One possibility is a CATCH/THROW-like setup:

escapes: use [chars][
    chars: make map! [
        newline "^/"
        tab "^-"
    ]

    [
        #"\" [
              #"n" throw (chars/newline)
            | #"t" throw (chars/tab)
        ]
    ]
]

string-rule: use [chars][
    chars: complement charset "\"

    [
        any [
              copy part some chars (emit part)
            | set escapes-product catch escapes (emit escapes-product)
            | skip (emit "\")
        ]
    ]
]

Other Suggestions

<fill this space>


Key Question on Virtual Binding and Mutability
#2

There’s been an awkwardness around reuse and integration of parse rules from the beginning. Related trello card

“Other Suggestions”

Haven’t got a great suggestion nor will comment on catch/throw due to lack of understanding, but I wonder if parse needs some some sort of built-in output channel (stack) that collects declaratively marked emit events - firing them according to some user defined policy (backtracking will not cause multiple of the same event).


#3

I too have been disappointed in the lack of reusability of PARSE rules. If you want to see something pretty bad, note how the desire to use the same BINARY! parsing rules for writing encapped data and reading it came out. :frowning:

If we put efficiency aside for a moment, might PARSE be built out of composable parts that speak a protocol a bit better? DO is a way of connecting FUNCTION!s together…some of which are native and some user-mode…but generally a user function can mimic a native. What if ANY and THRU were FUNCTION!s that were bound from a specific context, and you could throw some more in there?

If it helps the conversation about what the possibilities are, read this comment in %u-parse.c:

As a major operational difference from R3-Alpha, each recursion in Ren-C’s PARSE runs using a “Rebol Stack Frame”–similar to how the DO evaluator works. So [print "abc"] and [thru "abc"] are both seen as “code” and iterated using the same mechanic. (The rules are also locked from modification during the course of the PARSE, as code is in Ren-C.)

This leverages common services like reporting the start of the last “expression” that caused an error. So merely calling fail() will use the call stack to properly indicate the start of the parse rule that caused a problem. But most importantly, debuggers can break in and see the state at every step in the parse rule recursions.

The function users see on the stack for each recursion is a native called SUBPARSE. Although it is shaped similarly to typical DO code, there are differences. The subparse advances the “current evaluation position” in the frame as it operates, so it is a variadic function…with the rules as the variadic parameter. Calling it directly looks a bit unusual:

>> flags: 0
>> subparse "aabb" flags some "a" some "b"
== 4

Think of PARSE as leaning on DO a bit more, like this little “COMPOSER” sketch:

 composer-rules: make object! [
      any: function [data [string!] state [object!] rules [string! <...>] [
          str: take rules
          forever [
              if not pos: find/match data str [return data]
              data: pos
          ]]]

 composer: function [
     data [string!] rules [block!]
     /with context [object!]
  ][
     bind rules composer-rules ;-- imagine this is a virtual bind
     bind rules context ;-- add extra definitions

     v: make varargs! composer-rules 
     s: make object! [] ;-- imagine expandable
     until [tail? v] [
          unless function? f: get take first rules [fail "demo needs functions"]
          if not data: (f data s v) [ ;-- can update s and advance v forward
              return false
          ]]
      return tail? data
  ]

That’s just a late night sketch after smoking a little bit of crack–so don’t take it too seriously. But the premise is that maybe what PARSE needs to do is lean on that more powerful notion of contextual binding, and let you expand that state object with your own custom operations.

Binding remains very much a core problem for all this. And I think it’s important to think about the difference between contextual-operation [… do block …] and contextual-operation compose [… (block) …], because the semantics may wind up being quite different in terms of whether the outer operation imposes its binding notions on that block.


#4

(I talked about this in chat, but thought perhaps here would be a better place to record the thoughts.)


In thinking about this question about reuse of parse rules, we confront Rebol’s practice where BLOCK! is used as a unit of currency…in places where other composition-based languages would demand a function with explicit arguments. The trick in Rebol’s playbook is that each level of recursion doesn’t force conventional “parameterization”, your intent can be conveyed with a “rich” data structure which gets interpreted by the evaluator…either fully (as with the body of a loop) or partially (as with a GROUP! in a parse rule).

This gets you unstuck from a rigid model of information exchange (as f(x(y,z(a,b,c))) indefinitely). But if you use a BLOCK!, the absence of parameterization via the “call stack” (let’s lump in parse rule embedding as its own kind of a “call stack”), puts a lot of pressure on binding to accomplish composability. And binding, even ignoring its costs to do correctly, is a bit of a Rube-Goldberg black art.

While contemplating the benefits of the loose and unusual idea of “part code, part data” composition-via-data-structure, it’s worth noticing that composition via ordinary functions has a long history and can do pretty much anything. We might note that in the extremes of no access to mutable global state–a la Haskell–you can (and probably should) write IF in userspace too:

if' :: Bool -> a -> a -> a
if' True  x _ = x
if' False _ y = y

Their IF is handcuffed far more than a shared parse rule is, yet they have ways of escaping constraints. The Monad jumps in as a boundary-breaking concept, exempting one from certain rules in order to balance an otherwise imbalanced equation…all in the service of composability. Interestingly, if not coincidentally, one of the two involved operators is called bind (>>=):

(Recommended Viewing: Don’t Fear The Monad…not that you need to understand monads to understand what I’m trying to get at here, but everyone with an interest in software paradigms in this day and age would probably enjoy knowing at least a little about them.)

It seems to me that for Rebol to go further, it needs its own cross-cutting boundary-breaking concept…something that’s equally applicable to PARSE as it is to DO or your own dialects. And in an imperative language, the only place I can see missing leverage coming from is the “stack” (again using my extended definition of “stack”, where recursion in PARSE rules is effectively a call stack).

@rgchris’s direction of solution to the parse reuse problem is trying to mine that “call-stack” sensitivity with THROW and CATCH; where a rule achieves generality by deferring to the context of invocation. That would be a very PARSE-specific mechanism. But I’ve been talking for a while about something called virtual binding which is a system-wide way that parameters might follow a block around.

The good news is: Ren-C has a fairly solid technical basis to attack scenarios in this vein. Where Red’s BLOCK! cell has a “reserved for future use” slot, Ren-C has leveraged that pointer-sized slot to build up a good model for beaming contextual information down through blocks. When a cell is pulled out of an array, it is considered “relative until it is combined with a specifier”…and there’s a whole tricky type-checked bookkeeping making sure every relative cell undergoes that specification process before any words can be looked up. (A function FRAME! is one kind of specifier, and it’s the reason why recursions in Ren-C have unique lookups for the same word based on the recursion, while Red’s hands are tied here and it can only resolve words by looking at the C call stack.)

The bad news is: I’m not sure exactly how to use this to practical effect. The best idea I had so far was to say that REFINEMENT! would be a way of accessing contexts which had been augmented via the call stack… so one would be able to have one block that without a BIND could actually search linked attribute space based on who did the DO CODE.

>> code: [print /foo]

>> do in [foo: 10] code
10

>> do in [foo: 20] code
20

It’s an idea that can be accomplished with no binding or modifications involved. All because that out of band “specifier” threads through the execution path, and so for this example it’s IN that tweaks that specifier to add some information (in this case an OBJECT! created from the block contexts).

Just to reiterate the good news: this mechanic is already solved…when a “cell” is picked out of an array, it is considered to be “unspecified”. The only way it becomes a “specified” value (one which can resolve to a variable in a context) is if something threaded through the call stack gets recombined with it.

The underlying mechanic is something that has been working for well over a year now, and is pretty well understood, and checked to the point that I’m confident it works. The question is how to use this in practice for the practical problems we are seeing. I feel like if we get too theoretical about how parameters get synthesized or inherited, it starts seeming like attribute grammars, and that gives me headaches. So perhaps we can focus on real examples and see if there’s any good-enough tricks to pull.


Question about binding in parse