Series Switching in PARSE

hostilefork · September 20, 2022, 2:03am

Rebol2 Prohibited Series Switching During a PARSE

>> series1: [a a a]
== [a a a]

>> series2: [b b b]
== [b b b]

>> parse series1 ['a :series2 some 'b]
** Script Error: Invalid argument: b b b

The error wasn't particularly informative. But it was trying to tell you that you couldn't do that.

Red Chose to Follow Suit, and Prohibits Series Switching During a PARSE

red>> series1: [a a a]
== [a a a]

red>> series2: [b b b]
== [b b b]

red>> parse series1 ['a :series2 some 'b]
*** Script Error: PARSE - get-word refers to a different series! :series2

R3-Alpha Decided To Make It Legal

r3-alpha>> series1: [a a a]
== [a a a]

r3-alpha>> series2: [b b b]
== [b b b]

r3-alpha>> parse series1 ['a :series2 some 'b]
== true

I wasn't aware the feature was used, but @rgchris used it in the Rebol3 version of altjson:

Scripts/r3-alpha/altjson.r3 at 6fa69eabe11fe78b9fd0a7bd6bb17a923cee0b2b · rgchris/Scripts · GitHub

The Feature Was Added to R3-Alpha Circa 2009

Carl's blog entry:

http://www.rebol.net/r3blogs/0265.html

He points out one fairly clear reason why this is sketchy:

The problem is this: if you change the series but the rule fails, forcing a recovery to a prior index, it's still the new series. That is, we do not recover to the old series.

If advanced users are willing to live with that restriction, then this change can be made.

Another comment says the opposite of what I would think:

Input switching would make parsing of big (or streaming) files more easy, as we wouldn't have to keep the whole data in memory, and could read it as needed, without losing the current parse state.

Doing streaming parsing correctly requires tighter control over the process... not less.

Can The Desire Be Met Other Ways?

Since you're basically destroying the ability to meaningfully backtrack, I don't know how this is that different from starting a new parse.

I'd like it to be easy to return results out of a parse (see the RETURN/ACCEPT post)

So why wouldn't you have some kind of driving loop on the outside of your parse that looks for a continuation signal, and then starts a new parse with what it's given?

I want to take a look at the cases and see if they could be done some other way. So maybe @rgchris can explain the rational behind the choice in altjson, and if there's some feature that would be a better fit.

rgchris · September 20, 2022, 6:09pm

I don't believe that's the case: 'here is the point of insertion for the output value. As far as I'm aware, I haven't used input switching.

hostilefork · September 20, 2022, 6:11pm

Well perhaps I caused a bug, but I had these series created and then they were seeked to with the :here

https://github.com/rgchris/Scripts/blob/6fa69eabe11fe78b9fd0a7bd6bb17a923cee0b2b/r3-alpha/altjson.r3#L306

Ren-C raised an error due to the different series.

rgchris · September 20, 2022, 6:26pm

Ah, I don't know if that strictly counts as input switching, rather it's modifying the input series ahead of the point where it's currently parsing. e.g.

parse reduce [
    make object! [a: 1]
][
    and object!
    mark:
    (change/only mark body-of mark/1)
    into [
        set-word! integer!
    ]
]

I likely wouldn't do it this way again, it was just using the INTO as shorter way to recurse into a tree structure. A bit too hackish in retrospect.

hostilefork · September 20, 2022, 6:33pm

What I encountered was actually input switching, e.g. it was PARSE-ing one series and then being seeked to a new series that had been created.

e.g. there's code that says here: make block! 10 and by definition, if that series ever makes it to :here in mid-parse then it can't be the series that was being parsed already.

If that wasn't supposed to be possible I may well have just gotten things into a bad state. Or a Ren-C bug. Or a bug in the Ren-C port of the script. It happened when I gave it some bad input in Graham's PatientDB app...and when I fixed the input it stopped happening.

Given other priorities, I'm happy to accept you saying that you weren't actually trying to create new series and switch to them in mid parse...that was my main concern, that using the feature was by design.

rgchris · September 20, 2022, 6:40pm

If that's what was happening, then yikes! I'd need to check, but I don't think this code is radically different from the Rebol 2 version.

This shouldn't happen with this code though. The here: make block! 10 is in LOAD-JSON in which HERE only refers to the insertion point in the output value. In TO-JSON :HERE is only used in place of AND where HERE tracks the current position traversing the block to be serialized. To reuse the same example:

parse reduce [
    make object! [a: 1]
][
    mark: object! :mark
    (change/only mark body-of mark/1)
    into [
        set-word! integer!
    ]
]

(works in Rebol 2)

hostilefork · September 20, 2022, 6:45pm

It was hacked up, because I was trying to bridge past the BODY-OF bit...where objects were being converted to blocks...but you can't convert Ren-C's NULL to block values.

So I changed it to a MAP-EACH instead that created SET-WORD!s and reified elements (e.g. the word null).

But some intermediate point of that had the problem I describe, where I forgot to splice the blocks or something. I thought I was exercising a code path for serializing some type that depended on series switching by introducing new things in the structure. I may have just thrown it all out of whack.