LET added to PARSE... but what about COPY?

hostilefork · February 26, 2021, 4:33am

I've added LET into PARSE. So you can dynamically make variables to capture:

>> var: <untouched>

>> parse "a" [let var: skip, (print ["inside PARSE, var is" mold var])]
inside PARSE, var is #a
== "a"

>> var
== <untouched>  ; pretty cool...

So it's kind of a synonym for SET, except it declares the variable such that it's only in the "wave" of rule evaluation. Right now that means the variable won't be visible to subrules that are not embedded in rule blocks. So if you have rule: [some var] and try [let var: skip, rule] it won't work...you have to say [let var: skip, [some var]].

(We have some flexibility in this behavior...as PARSE/INSIDE has demonstrated...but I'm trying just to get the basics to work for now.)

If you don't use a SET-WORD!, it will create the new variable...but leave it unset.

But What About COPY?

If LET acts like SET, how do you get the new-variable-declaration of LET...but with COPY semantics?

I've pointed out previously that I think we should probably go to a model more like Topaz PARSE. This is to say that we mark positions by saying things like pos: here (and seek them with seek pos), reclaiming SET-WORD! and GET-WORD!. Then SET-WORD! can mean "set".

To follow that idea exactly, then let x: copy ... is the syntax for getting copy-like semantics, where let x: ... (and plain x: ... to reuse an existing declaration) would assume you were only going to be capturing one parse item.

One wacky-but-cool thought I had was that let [x]: ... and [x]: ... in PARSE could instead of multi-return, mean multi-capture, e.g. COPY.

What Is The Philosphy Behind SET and COPY, Anyway?

It's a little weird that you can write:

>> parse "aaaaa" [set x some "a"]

It was an error in Rebol2. But Red and R3-Alpha just give you the first A. I've reverted Ren-C to Rebol2's idea of saying that's an error...you matched more than one item, so having the result be just #a seems wrong.

When we look at that with a SET-WORD!-based instruction, it seems counterintuitive that a plain SET-WORD! wouldn't react to SOME with "copy semantics"?

>> parse "aaaaa" [x: some "a"]

Doesn't it seem like you're asking for X to be set to "aaaaa"? Why should you have to say [x: copy some "a"] or [[x]: some "a"]?

But while it feels like rules that match multiple things should give series results, you wind up wondering about things that can be either-or:

>> parse data [x: [integer! | some text!]]

Because the rules are matched in order, you don't have notice that a SOME comes later if you match an integer!. The composite rule may be a single thing, or multiple things.

But maybe that's just life. If you match a rule that's a single-kind of rule, you get the single thing...and if it's a multiple kind of rule, you get the multiple thing.

Maybe we could make a decoration that turns single matches into multiple matches if that's what you meant? or instance, 1 integer! gives you a BLOCK! of integers, while integer! gives you just 1.

>> parse ["a"] [x: [integer! | some text!]]
>> x
== ["a"]

>> parse [304] [x: [integer! | some text!]]
>> x
== 304

>> parse [304] [x: [1 integer! | some text!]]
>> x
== [304]

The reason I suggest the 1 case for moving it to do a block capture is because no one would say literal 1, and if you are using a number N then you likely want a block regardless...otherwise you'd get a different type based on how many matched.

If this seems reasonable, we could eliminate the SET and COPY distinction...as it always seemed odd to me.

Inconvenient Truths: PARSE Rules Looking More Like Regular Code

So PARSE's SET is rather different from regular SET. It doesn't take a variable which holds the word to set, it effectively quotes the word it wants to set.

 >> x: 'word
 >> set x "a"
 >> word
 == "a"

 >> parse "a" [set x skip]
 >> x
 == "a"

The "Topaz way" would bring about a transition that would make this look different.

 >> parse "a" [x: skip]
 >> x
 == "a"

This small stylistic note is going to make PARSE rules look a little bit more like regular code.

For those who are fans of syntax highlighters, the nature of the language is such as to defeat them. So knowing when you're in a parse rule or not is just going to be something a bit hard to do.

I think this is "just life". It's a little disappointing, and makes me long for the "code as graph database" were you have unlimited rendering options. Different project.

BlackATTR · February 26, 2021, 1:18pm

First up: LET in Parse is a big

Next, there are some things I like about Topaz PARSE, but there are definitely some changes that folks will want to chime in about. I personally don't mind parse looking like regular code, as described here.

(Sidenote: Topaz uses * as an alias for 'skip, but I think that's confusing and would prefer to see more widely accepted aliases like:
* = Zero or more items
? = Any single text!
# = Any single digit -- although this might be confusing with char!)

I'm not sure about this. I'm used to the convention of a digit specifying the # of times a rule should fire, and while few would write 1, it's not uncommon to say 2+. If I'm reading this right, this means we'd have a convention where 1 means this thing and 2+ means another.

hostilefork · March 2, 2021, 5:06pm

No. The suggestion here is that you'd get a series of elements either way, as opposed to a single element. This way you'd have a syntax for one element (no number) vs. collection containing one element (number 1).

However, this turns out not to be the big issue with trying to have a single "SET" operator. The big issue is:

 x: [integer! | between "(" ")" | integer! integer! | some text!]

It doesn't make a whole lot of sense what a block of alternates should do in a set.

COPY currently implies "see however far across the input the rules spanned, and copy the input series across that range."

However, some rules are now being established to actually produce results...in addition to moving the input. It's not fully clear how the two different meanings should operate.

This is heavy issue #1 with UPARSE, but fortunately it's a good substrate for hacking up alternatives.