Is INTEGER! in PARSE Too Obfuscating?

There's a bit of a problem in the combinator-based UPARSE with the historical convention of an INTEGER! rule, where if you have two integers in a row it indicates a range of how many things to accept.

rebol2> parse "**" [1 2 "*"]  ; between 1 and 2 stars
== true

rebol2> parse "**" [3 4 "*"]  ; between 3 and 4 stars
== false

It doesn't fit into the "natural" pattern of combinators, e.g. where 1 2 RULE is the same as 1 [2 RULE].

That doesn't mean it can't be done...it just means it would have to be part of the BLOCK! combinator specifically to notice integer sequences (vs dispatching to some generic INTEGER! combinator)

I've only used the range-of-times check fairly rarely, myself. So I wonder how much worse it would be to use TALLY. So instead of min max rule you could say:

let n: tally rule, :(did all [n >= min, n <= max])

This gives you more freedom in expressing the bounds, while not being optimized for the specific case. It also means that if N is much greater than max, you'll waste time running rule you wouldn't have to.

But, the Bigger Question...

How good an idea it is to have something that acts as a keyword be abstractable? I've often been confused when I see something like:

 parse ... [... something rule ...]

And SOMETHING is actually an integer. It isn't a rule in its own right, it's picking up the next rule...like a keyword.

As a reader, I feel like a keyword like REPEAT is missing here. I don't know how much we want to stick to the rule that every word resolve to a parse rule...if we did, then you'd have to escape out of that with the likes of:

 parse ... [... repeat (2) rule ...]
 parse ... [... repeat (something + 1) rule ...]

But perhaps within PARSE, literal integers could be exempted...assumed to never match integers, but be passed as a parameter.

This doesn't answer what to do about ranges of integers (and BETWEEN is taken). But just pointing out that I don't know that the current super-succinct integer syntax pays off all that well. It might be better to be a little more verbose to make it clear that a repetition is happening.

I've never had a problem with plain number for repeat in my own code but when trying to understand someone else's code I find it a lot harder to understand, especially if words are used.

There's a basic question of whether abstractions-via-WORD! should be legal or not. Rebol2 allows you to abstract keywords themselves:

rebol2> keyword: 'some
== some

rebol2> parse "aaa" [keyword "a"]
== true

R3-Alpha did not support this:

r3-alpha> keyword: 'some
== some

r3-alpha> parse "aaa" [keyword "a"]
** Script error: PARSE - invalid rule or usage of rule: keyword

Nor does Red:

red> keyword: 'some
== some

red> parse "aaa" [keyword "a"]
*** Script Error: PARSE - invalid rule or usage of rule: some

I don't think it's a good idea to permit WORD!-abstraction of PARSE keywords. There does need to be a mechanism by which you can adjust the set of combinators...which I think is better hooked other ways. That's the angle being pursued with the Redbol compatibility initiative (UPARSE2).

There is currently a workaround for this:

>> keyword: 'some

>> uparse "aaa" [:(keyword) "a"]
== "aaa"

GET-GROUP! lets you do a sort of "live-COMPOSE" to use evaluated material as a rule. While it's mostly used to splice in calculated BLOCK!s or LOGIC!s to continue the parse or not, it currently lets you put in lone keywords too.

I think limiting the WORD!-abstraction of keywords probably carries a lesson for INTEGER!s being abstracted as well. Perhaps you can use them literally in rules, but you use something like REPEAT when they are abstracted.

1 Like

I have implemented the REPEAT combinator. It's now in traditional native PARSE as well as UPARSE.

It's a bit "unfortunate" that you have to put integer variables in a GROUP!:

>> var: 3
>> uparse? "aaa" [repeat (var) "a"]
== #[true]

We could consider if non-literal integers act as rules that do not advance the input and synthesize an integer (in contrast with literal integers that indicate a repeat count). That would make repeat var "a" different from repeat 3 "a" which creates irregularities.

But on the plus side of the current behavior, you can write rules that get the repeat count from the input and apply it immediately. The following example is contrived, but maybe not so much so that you couldn't see something like it ever happening:

>> uparse? ["b" 3 "b" "b" "b"] [rule: <any>, repeat integer! rule]
== #[true]

I think ranges of integers is a sufficiently rare pattern to warrant a combinator specific to the purpose, rather than breaking the coherence of the model to where 2 [3 rule] and 2 3 rule have distinct behavior.

But I do notice that the pattern 0 n rule is really just repeat (n) opt rule.

More generally repeat m n rule can be written as repeat (m) rule, repeat (n - m) opt rule

So I'm going to be killing off ranged repetition in native PARSE. We can discuss what the real answer should be, but that pattern can be used in the meantime.

1 Like

Er...duh...why can't REPEAT just take a BLOCK! or somesuch?

uparse "aaa" [repeat ([2 3]) "a"]

Got variables? GET-BLOCK! is your friend for a lighter notation than REDUCE:

uparse "..." [repeat (:[min max]) "a"]

Not entirely sure why this didn't occur to me sooner, but. Now it did occur to me, so...we get it. :slight_smile:

But we've learned some things in the meantime. The OPT rules I had put in as substitutes were what exposed things like the weaknesses above.

2 Likes

It's easy to get bitten by using literal integers with REPEAT. You mean to write repeat (2) rule but you write repeat 2 rule, which in the generic architecture assumes that you want to use the result of [2 rule] as the repeat count for whatever comes afterward.

Most of the time you'll get an error, but the error isn't easy to sort out.

If we had skippable arguments for combinators, it might have a hard literal INTEGER! argument...and if it found there was one there, it could give an error. That might be better than nothing, though any error in the other arguments might pre-empt being able to react to it (since you could only raise an error on the argument once the combinator was running).

This leads me to believe that REPEAT might be a case where it needs to use a different parameter convention which would give a coherent error if you put a literal integer in that slot.

2 Likes

Maybe it's just my use cases. But I don't really find myself having literal numbers of times to run a rule very often.

I almost never use something like:

parse data [... 5 rule ...]

The very few times such things come up, I think it would be clearer as:

parse data [... repeat 5 rule ...]

If integers in PARSE were literal, that would make it easier to pass them to functions like, say, SKIP.

parse data [... skip 3 ...]
parse data [... skip -1 ...]

Since I prefer NEXT as a variable name to an operation (and want people to use NEXT OF instead of NEXT on series), I'm probably not going to push for NEXT as an arity-0 parse keyword. But skip 1 is a bit smoother to type than <any> and is literate/generic, opening the doors to other skips.

(I like skip 3 better than 3 <any> for what is intended.)

I've Demonstrated You can Define INTEGER! As Repeat Count

The existence of the INTEGER! combinator shows you can do it if you want to. And Redbol PARSE wants to.

But is it really worth it...compared to keeping our beloved mathematical abstraction of the INTEGER! abstract, to be passed to any combinator to interpret?

PARSE is getting more power, with the ability to define off-the-cuff combinators and use them inline in a single parse call. So that's quite a lot of combinators that could generically interpret integers. REPEAT feels like a drop in the bucket of things you might want to do.

So I say make INTEGER! a combinator that doesn't move the parse position and just evaluates to the integer. How about that?

2 Likes

Amen. I think it would be clearer to write:

repeat 5 rule

Having those integers floating around in rules can be a speed-bump when reading, and they're probably a hurdle for newbies.

2 Likes

With INTEGER! being a combinator that just evaluates to itself, it permits us to have a fairly literate SKIP combinator:

>> parse [a b "Much more literate!"] [skip 2, text!]
== "Much more literate!"

Actually makes sense...as opposed the historical 2 skip. That required you absorbing that SKIP is an arity-0 function which you sometimes use to match ANY-VALUE! for SET rules... as well as that integers prefix a repeat count of the rule after it. Much clearer to say SKIP is a combinator that takes how much to skip!

Furthermore, per a thought from @IngoHohmann, it can have ELIDE-like behavior...which looks like a reasonable choice:

>> parse ["SKIP can return void!!!" a b] [text!, skip 2]
== "SKIP can return void!!!"

:sunglasses:

So the exception I've found here is byte counts when parsing binary. Things that know how many bytes they want literally put those byte counts down.

But where I once did copy 2 skip in these cases, that had become across 2 <any>. But rather than robotically turn that into across repeat 2 <any> this just needs to go back to across skip 2.

Yet in truth...these cases all tend to suck...and what they actually want is something like @rgchris's BINCODE, or ENBIN/DEBIN combinators. So I wouldn't have worried about it anyway.

2 Likes