The Pathing and Picking Predicament Pans Out

TL;DR

I'm changing path dispatch to be fundamentally recursive, and via a unified action dispatcher called PICK-POKE*. As fate would have it, this takes a longstanding annoying edge case in how R3-Alpha methodized path dispatch for GOB!...and makes that "the answer" for how all path dispatch is done. Additionally:

  • It employs FRAME! reuse for native dispatch, which allows a memory use profile similar to the the "PVS" (Path Value State?) structure from R3-Alpha...even though it's making recursive evaluator calls.

  • This makes path dispatch just ordinary function dispatch, which means pathing doesn't need special accommodation in stackless.

    • Sidenote: Path dispatch and PARSE were two extremely troublesome areas in the stackless conversion. Both are being addressed by pushing more and more "custom" recursive C code to do their recursions by means of the interpreter's evaluator loop.
  • It should mean that user-defined data types--or even in the near term perhaps your own OBJECT!s--could customize pathing if they wanted to (though we may limit customization to / access in order to provide terra firma for accessing the object)

Preface: No One Has Done This Right (Until Now)

Before we get carried away and give GOB! too much credit... :slight_smile:

Redbols try to be bit fiddly and pack things into immediate cell values. But if you're going to be that fiddly, you have to remember that "updating a value" really means "updating the container the value lives in".

Random example from Red and Rebol2:

red>> b: [x 12-Dec-2021/10:00 y]
== [x 12-Dec-2021/10:00:00 y]

red>> b/2/time/hour
== 10

red>> b/2/time/hour: 20
== 20

red>> b
== [x 12-Dec-2021/10:00:00 y]  ; still 10:00, not 20:00

(R3-Alpha is worse, giving an error and corrupting the time.)

The reason this happens is because DATE! fits in a cell with the TIME! packed into it. If you ask for a date's /TIME then you get a synthesized new cell to hold it. But poking back into that synthesized cell won't change the original date.

Hence every SET-PATH! or POKE has to offer a kind of backflow in the chain of poking, in case any of the forward writes require bit updates backwards. Frame reuse allows that to be made somewhat efficient with one cell's worth of stack, while PICK can use a "sibling tail call".

The Ren-C I'm working on takes care of the above example, and should generalize to others!

Now, Explanation. Background:

Path Dispatch--or "PD" as R3-Alpha called it--was a concept fraught with issues.

On the surface it seems like a simple chain...it's broken into steps where each one produces a value that is picked by the next step:

>> outer: make object! [inner: make object! [block: [a b c]]]

>> outer/inner/block/2  ; expressed as a path
== b

>> pick (pick (pick outer 'inner) 'block) 2  ; expanded as picks
== b

; Note: Historical Rebol required a mix of SELECT and PICK, Ren-C unifies it

How "hard" is that? Well, it's not that hard, though it could be very wasteful.

Imagine the FFI with some-struct.million-int-array.1. If that's an FFI interface to a struct with a million C int in it, do you have to generate a BLOCK! of a million INTEGER! just to pick the first one? That's what the naive translation of pathing to step-by-step PICK calls would do.

Even without talking about efficiency, we can talk about semantics. PICK at least works out semantically for pathing, but POKE does not. Try this:

>> outer/inner/block: [i am a new block]
== [i am a new block]

>> poke (pick (pick outer 'inner) 'block) [i am a new block]
** Error, wait a second...

That second formulation is not equivalent...because the PICK gave back a plain old block. So it saw:

>> poke [a b c] [i am a new block]

This faces the problem that Rebol lacks "Reference" types. POKE wanted a place to put the new block...effectively the address of the block value in the inner object. But it just got back the value of the block in the inner object.

Could Rebol Have A Reference Type?

Hypothetical code:

>> obj: make object! [field: "I am a field"]

>> ref: &obj.field
== &"I am a field"

>> ref: "Field is replaced!"

>> obj
== make object! [field: "Field is replaced!"]

In such a world, changing REF didn't change the string... it changed a field in the object the string lived in.

This is the kind of mechanic that pathing would need if it were to be extensible and truly generic. What each step in the path offered up to the next would have to be a means of writing back to the field if it wanted to.

That sounds like a nightmare...but it wouldn't even solve the problem if it could be done, because...

...Subaddressing Makes It Worse!

Some of the more confusing pats of path dispatch dealt with the fact that path steps might be producing something that didn't reference a full value at all...but some optimized bit pattern.

>> obj: make object! [gob: make gob! [x: 10 y: 20]]

>> obj.gob.size.x: 304
== 304

What's so weird about that? Well, GOB! stores its bits compactly, so there is no INTEGER! cell for the X, and no PAIR! cell for the size.

This is all easy enough on the PICK side... you ask the GOB! what its size is, and it tells you 10x20 as a new PAIR! it makes out of thin air. Then you ask that pair what its X is and it tells you 10.

But on the POKE side, even if you had the address mechanic, there's no address of a value that the GOB! can give for that PAIR! to let you write back to it.

GOB! in R3-Alpha Actually Had The Right Idea...Sort Of

With path dispatchers like R3-Alpha's PD_Block() that it has a "picker" (pvs->select) which it is applying to the value that's a BLOCK! or GROUP! (pvs->out). It trusts that the "path engine" has pre-evaluated any code in parentheses if necessary to get pvs->select.

Then it has the detail that if pvs->setval is not null, it needs to know it's a SET-PATH!. This is also something the path engine works out...based on whether the end of the path is reached. Really this looks like it moves one step a time.

So in my early dealings with path dispatch, I'd try to formalize this a bit better...putting horse-blinders on the PD_Xxx() function by giving it narrow parameterization, and removing the PVS as a parameter. Yet I tripped over "bad" path dispatchers like PD_Gob(), which called Next_Path() in their implementations.

But the "bad" handling of GOB! was closer to the right general answer:

  • Some portion of the path is consumed by each step in the SET-PATH!

  • It then it hands the remainder off to what it can't handle via a recursive call...

    • The return value of this recursive call is either NULL or an updated image of the cell bits that must be updated in the container to reflect an immediate type.

Walking Through The Process With GOB!

Imagine you write:

>> obj/gob/size/x: 304

It might happen like this:

  • POKE asks OBJECT! "Hey, I want to write gob/size/x. How much of that can you do?"

  • OBJECT! says "I will update myself if GOB! can tell me the answer to what it wants to be if size/x is written."

  • GOB! says "I consumed the entirety of size/x: 304 and there were no changes to my bit pattern that my caller need be aware about." (because gobs are allocated in handles, REBGOB*, so the modification of the size bits is not the concern of the reference in object as it still points to that same REBGOB*)

  • OBJECT! says "Okay fine then."

Notice that We never got PAIR! involved in the dispatch, even though the answer to gob/size is a PAIR!.

That's not the only way to do it. There's actually three ways this could work:

  1. (the above way) Don't just consume one of the steps, but go ahead and do two--e.g. take control of what size.x means and don't synthesize a PAIR! at all.

  2. Synthesize a PAIR! and allow it to do whatever modification it wishes, but ignore its nullptr return status and pack the full pair value down to the low-level bits in the GOB!

  3. Drop this micro-optimization and store a PAIR! cell in the GOB! structure.

I actually think #3 is the best answer, but, the point here is to study being general.

The New Formulation Is About As Good As This Can Get

It's interesting to be able to do this kind of optimization, and things like the FFI need it. I mentioned some-struct.million-int-array.1. Naive approaches will be too inefficient to handle this.

So path processing needs this nuance. And we'd like an answer that doesn't make the author of STRUCT! have to worry about some-struct.million-int-array.(1 + 2), so the processing of GROUP!s has to be done by the pathing.

Long story long: this is a PITA and I'm making some headway on framing the problem. Things seem in better shape, as the oddly-shaped PD_Xxx are eliminated.

3 Likes

Translating Pathing Into POKE

I mentioned that there's more than meets the eye to path expressions.

So if you write something like:

a/(expr b)/c: expr d

You might think that the first step is to get whatever A is, and POKE into that. Maybe it would translate to something like:

apply :poke [
    /location :a
    /steps compose [(expr b) c]
    /value (expr d)
]

But that doesn't generalize. Let's say a is a DATE!, for instance:

>> a: 12-Dec-2012/10:20
>> a.time.hour: 11

If we think of this as poke :a [time hour] 11 we get poke 12-Dec-2012/10:20 [time hour] 11.

The problem there is that you've lost the knowledge of where the bits that store the date-time live. In the case of DATE!, they live in whatever object a was bound into.

If a lived in the user context, what you really wanted here was:

poke #<<user-context>> [a time hour] 11

Generalizing This Is Non-Trivial, Especially Errors

I mentioned what I think is a fairly workable recursive formulation:

  • offer opportunities to each object along the chain to process as many of the "steps" as it wants to.

  • potentially return bits back at each level for the container to update

But if the POKE* method makes these recursive calls directly, there's no central service to map any errors back to a location in a path where they came from. You'd not be implicating a particular part of the path that caused a problem.

FWIW, Red doesn't offer that granularity:

red>> var: 'z
== z

red>> obj/(var)/(var)
*** Script Error: cannot access z in path obj/(var)/(var) 

You don't know which step it's complaining about; you just know what the error is, and what path was being processed while it happened. That's not much, but even that's a bit hard to define when your path mechanism is fully generic. What if there's a division happening incidental to the path poking but just part of the implementation, and it fails? Wouldn't it help to know at least that it was the OBJECT! POKE* method that was having trouble, instead of a generic error attached to the PATH!?

I don't know, but I do know that errors are getting worse...sometimes directly as a result of mechanics becoming generalized and improving. Generalized code often gives very poor errors; while narrower code that handles fewer cases can give better ones. That's an unfortunate tradeoff. :frowning:

2 Likes