Is INTEGER! in PARSE Too Obfuscating?

There's a bit of a problem in the combinator-based UPARSE with the historical convention of an INTEGER! rule, where if you have two integers in a row it indicates a range of how many things to accept.

rebol2> parse "**" [1 2 "*"]  ; between 1 and 2 stars
== true

rebol2> parse "**" [3 4 "*"]  ; between 3 and 4 stars
== false

It doesn't fit into the "natural" pattern of combinators, e.g. where 1 2 RULE is the same as 1 [2 RULE].

That doesn't mean it can't be just means it would have to be part of the BLOCK! combinator specifically to notice integer sequences (vs dispatching to some generic INTEGER! combinator)

But it brought me to a bigger question of how good an idea it is to have something that acts as a keyword be abstractable. I've often been confused when I see something like:

 parse ... [... something rule ...]

And SOMETHING is actually an integer. It isn't a rule in its own right, it's picking up the next a keyword.

As a reader, I feel like a keyword like REPEAT is missing here. I don't know how much we want to stick to the rule that every word resolve to a parse rule...if we did, then you'd have to escape out of that with the likes of:

 parse ... [... repeat (2) rule ...]
 parse ... [... repeat (something + 1) rule ...]

But perhaps within PARSE, literal integers could be exempted...assumed to never match integers, but be passed as a parameter.

This doesn't answer what to do about ranges of integers (and BETWEEN is taken). But just pointing out that I don't know that the current super-succinct integer syntax pays off all that well. It might be better to be a little more verbose to make it clear that a repetition is happening.

I had a bit of a thought about the naming gap between COUNT-UP and REPEAT, which made me think about what REPEAT might mean that was different.

If REPEAT appeared in PARSE to mean "try and do this N times, otherwise fail" then it could mean that in ordinary code too.

 data: [a b]
 repeat 3 [not null? take data] then [
     print "Successfully took 3 elements"
 ] else [
     print "Failed to take 3 elements"

It's a thought on how the word might be used consistently. Not necessarily a great thought, just a thought.


If REPEAT appeared in PARSE to mean "try and do this N times, otherwise fail" then it could mean that in ordinary code too.

Or... if LOOP is the arity-2 form of looping which takes a number, what if it was just the generic arity-2 form...and if you gave it a block it assumed that was the condition?

x: 10
loop [x > 0] [
    print "Counting down"
    x: x - 1

loop 10 [
   print "Counting down"

This would make WHILE like the truthy-based UNTIL...and it would be the same in PARSE and DO.

data: [a b]
while [take data]

Though we've discussed LOOP being a more powerful dialect, due to the short word. Maybe REPEAT should be the arity-1 while-like thing?

repeat [take data]

And then it would make more sense in PARSE if it were arity-1 there too. Anyway, I feel like the inconsistency of WHILE is a bit jarring at the moment.

Consistency on WHILE and UNTIL was an old-ish question which would be resolved if they were both arity-1. I don't know that making LOOP polymorphic w.r.t. BLOCK! vs. INTEGER! was suggested before, though maybe it was.

Replying to the original question:
I've never had a problem with plain number for repeat in my own code but when trying to understand someone else's code I find it a lot harder to understand, especially if words are used.

There's a basic question of whether abstractions-via-WORD! should be legal or not. Rebol2 allows you to abstract keywords themselves:

rebol2> keyword: 'some
== some

rebol2> parse "aaa" [keyword "a"]
== true

R3-Alpha did not support this:

r3-alpha> keyword: 'some
== some

r3-alpha> parse "aaa" [keyword "a"]
** Script error: PARSE - invalid rule or usage of rule: keyword

Nor does Red:

red> keyword: 'some
== some

red> parse "aaa" [keyword "a"]
*** Script Error: PARSE - invalid rule or usage of rule: some

I don't think it's a good idea to permit WORD!-abstraction of PARSE keywords. There does need to be a mechanism by which you can adjust the set of combinators...which I think is better hooked other ways. That's the angle being pursued with the Redbol compatibility initiative (UPARSE2).

There is currently a workaround for this:

>> keyword: 'some

>> uparse "aaa" [:(keyword) "a"]
== "aaa"

GET-GROUP! lets you do a sort of "live-COMPOSE" to use evaluated material as a rule. While it's mostly used to splice in calculated BLOCK!s or LOGIC!s to continue the parse or not, it currently lets you put in lone keywords too.

I think limiting the WORD!-abstraction of keywords probably carries a lesson for INTEGER!s being abstracted as well. Perhaps you can use them literally in rules, but you use something like REPEAT when they are abstracted.

Still Pondering Arity-2 LOOP and Arity-1 WHILE

I've long wondered if WHILE and UNTIL should be paired. The idea of making them both arity-1 came up but was often dismissed due to how much people like having an arity-2 WHILE.

But if you could write loop 2 [...] or loop [x > 0] [...], that seems to make up for it. And it could help arity-1 WHILE make more sense in PARSE.

Then LOOP could step in to address this integer case. It wouldn't help with the idea of matching a range of times, which doesn't have any parallels in DO. (Range implies you are heeding the count and testing the truthiness of the loop... no such construct exists in DO.)

I've only used the range-of-times check fairly rarely, myself. I wonder how much worse it would be to use TALLY. So think of:

 min max rule

...represented as:

 let n: tally rule, :(did all [n >= min, n <= max])

This gives you more freedom in expressing the bounds, while not being optimized for the specific case. It also means that if N is much greater than max, you'll waste time running rule you wouldn't have to.

1 Like

I have implemented the REPEAT combinator. It's now in traditional native PARSE as well as UPARSE.

It's a bit "unfortunate" that you have to put integer variables in a GROUP!:

>> var: 3
>> uparse? "aaa" [repeat (var) "a"]
== #[true]

We could consider if non-literal integers act as rules that do not advance the input and synthesize an integer (in contrast with literal integers that indicate a repeat count). That would make repeat var "a" different from repeat 3 "a" which creates irregularities.

But on the plus side of the current behavior, you can write rules that get the repeat count from the input and apply it immediately. The following example is contrived, but maybe not so much so that you couldn't see something like it ever happening:

>> uparse? ["b" 3 "b" "b" "b"] [rule: <any>, repeat integer! rule]
== #[true]

I think ranges of integers is a sufficiently rare pattern to warrant a combinator specific to the purpose, rather than breaking the coherence of the model to where 2 [3 rule] and 2 3 rule have distinct behavior.

But I do notice that the pattern 0 n rule is really just repeat (n) opt rule.

More generally repeat m n rule can be written as repeat (m) rule, repeat (n - m) opt rule

So I'm going to be killing off ranged repetition in native PARSE. We can discuss what the real answer should be, but that pattern can be used in the meantime.

1 Like

Er...duh...why can't REPEAT just take a BLOCK! or somesuch?

uparse "aaa" [repeat ([2 3]) "a"]

Got variables? GET-BLOCK! is your friend for a lighter notation than REDUCE:

uparse "..." [repeat (:[min max]) "a"]

Not entirely sure why this didn't occur to me sooner, but. Now it did occur to me, so...we get it. :slight_smile:

But we've learned some things in the meantime. The OPT rules I had put in as substitutes were what exposed things like the weaknesses above.


It's easy to get bitten by using literal integers with REPEAT. You mean to write repeat (2) rule but you write repeat 2 rule, which in the generic architecture assumes that you want to use the result of [2 rule] as the repeat count for whatever comes afterward.

Most of the time you'll get an error, but the error isn't easy to sort out.

If we had skippable arguments for combinators, it might have a hard literal INTEGER! argument...and if it found there was one there, it could give an error. That might be better than nothing, though any error in the other arguments might pre-empt being able to react to it (since you could only raise an error on the argument once the combinator was running).

This leads me to believe that REPEAT might be a case where it needs to use a different parameter convention which would give a coherent error if you put a literal integer in that slot.