Calling Combinators (Decoders?) as Normal Functions

hostilefork · August 15, 2022, 7:39am

A few times I've talked about the potential of making it possible to call a COMBINATOR function from outside of PARSE.

This is to say that if some PARSE-specific parameter was missing (e.g. the "parse state") there'd be a mode in the guts of the COMBINATOR mechanic which cooked up something like a temporary parse session just for the input you passed in.

Would It "Combinate" Parsers For You?

The situations I had in mind weren't really combinators that take parsers as parameters. And now that I look at it, I think that suggests that... no, you probably shouldn't call these kinds of combinators outside of parse.

Here's one imagination of calling a combinator like SOME:

>> [value rest]: some "aaaabbb" [repeat 2 "a"]
== "a"

>> value
== "a"

>> rest
== "bbb"

This exposes how SOME is actually not arity-1. Though it takes a "combinated parser" as a parameter, it also takes an INPUT...but that's usually implicit...specialized in by PARSE. But calling directly from normal code it could offer that parameter being gathered normally.

it doesn't feel that compelling, since you're getting a synonym for:

parse "aaaabbb" [some repeat 2 "a"]

But also, why would it take that interpretation instead of:

parse "aaaabbb" [some ([repeat 2 "a"])]

One point of view would say it makes more sense to think of the expression as the product of evaluation, because the argument would presumably be otherwise evaluative:

>> [value rest]: some "aaaabbb" append [repeat 2] "a"
== ??? ; infinite loop?

But this would make rule-taking combinators nearly useless.

It Was Suggested For Sharing "Decoding", not "Combinating"

Seeing how SOME isn't a good example for this, maybe the right way of saying what I'm trying to say here this is that there's some category of functions we might call "decoders"...and PARSE would be willing to call these.

They'd fit a standard format regarding things like taking an input series and giving back an amount of progress or an error. But they would not be passed something like the parser stack or have any automatic composition of parsers as arguments.

Plain decoding operations--like ENBIN and DEBIN--were the motivating cases:

>> debin #{FEFFFF} [le + 3]
== 16777214

>> parse #{FEFFFFFEFFFF} [collect [keep debin [le + 3]]]
== [16777214 16777214]

The idea here was that you could write one version of DEBIN, and it would be able to implicitly pick up the INPUT when used in PARSE.

But because the input is an implicit parameter that you get automatically for all "decoders", then without extra information it would have to be at either the beginning or end of the parameter list. Above it's at the beginning, which is different from how DEBIN was defined originally:

>> debin [le + 3] #{FEFFFF}  ; original DEBIN design took dialect block first
== 16777214

(Note: I have a post about parameter ordering which questions the series-first model.)

We could say that "decoders" have to manually mention their input parameter somewhere, and position it in the order that it would be consumed if it's used outside of PARSE...which would allow customization of this process. It could default to being the first parameter if not positioned explicitly. Not an idea-killer, in any case.

If All The Input Wasn't Consumed, It Would Error

One idea of calling these decoders on arbitrary input could be that if the end of input was not reached, it would give an error:

>> debin [le + 3] #{FEFFFF00}  ; asking for 3 bytes of decode, passed 4
** Error: DEBIN did not consume all input, request remainder if intentional

Asking for a remainder could prevent the error:

>> [value rest]: debin [le + 3] #{FEFFFF00}
== 16777214

>> rest
== #{00}

So this is kind of where the motivation is. Once you've written the decoder version of DEBIN, you have everything you need to run a DEBIN operation inside or outside of PARSE. So why should you need to write a separate combinator and non-combinator form?

As usual, more thought needed.

hostilefork · October 7, 2024, 6:15pm

So I think I've ruled out the concept of trying to do this with anything that "combinates". It's too fuzzy what it would mean in terms of the arity of invocation. Just call PARSE.

However...

This idea has merit.

hostilefork:

>> debin [le + 3] #{FEFFFF00}  ; asking for 3 bytes of decode, passed 4
** Error: DEBIN did not consume all input, request remainder if intentional

Asking for a remainder could prevent the error:

>> [value rest]: debin [le + 3] #{FEFFFF00}
== 16777214

>> rest
== #{00}

For reasons of composability, multi-return no longer lets the right hand side "see" what the left hand side requested. (You can read the long winding history to understand why.)

So it's kind of like all of these would need to offer something like TRANSCODE's :next refinement. And I think the convention for those functions returning the series as the primary result is sound.

>> [rest value]: debin:next [le + 3] #{FEFFFF00}
== #{00}

>> value
== 16777214

PARSE would have to ask for the :NEXT to do its job.

DEBIN Seems Like A Good First Target. But ENBIN?

An encoder could be used in UPARSE as well, though its only real meaning would be to splice in data at the current position and then move the tail to after the splice.

You could do that with [insert (/your-encoding call-here)] as a parse rule, so it's not necessary to have access to the parsed series...unless you thought it would be more optimal to have the evil :INTO to say where to put the data...

Maybe encoders have an :INTO and that's what distinguishes them from ordinary run-of-the-mill functions, I don't know.

Anyway, just throwing a couple more thoughts in here...

bradrn · October 8, 2024, 4:20am

Note that this is precisely how parser combinators work in Haskell.

I think that, if I could design PARSE any way I wanted, I would do the following:

Define a set of underlying parser combinators as ordinary functions with a standardised interface
Define the dialect as being ‘syntax sugar’ on top of those functions to make them easier to use

This is basically the same principle Red uses for their GUI dialect, and I think it’s a good idea. The dialect makes ordinary usecases ergonomic, while the availability of the underlying functions means that you can integrate them into other parts of the language as needed.

hostilefork · October 8, 2024, 1:05pm

I will point out that UPARSE builds on basics in ways that e.g. Red PARSE does not. Consider:

red>> find "abcd" ""
== none

red>> parse "abcd" ["" "abcd"]
== true

Is there a "" at the beginning of "abcd" or is there not? They have different answers because they don't run through a common code path.

UPARSE actually builds on FIND, and for this reason has a consistent answer:

>> find "abcd" ""
; first in pack of length 2
== "abcd"

>> parse "abcd" ["" "abcd"]
== "abcd"

Note also that Ren-C FIND tells you the tail position of finds if you want it (that's why it's a pack).

>> [pos tail]: find "abcd" "bc"
== "bcd"

>> tail
== d

So things are a step forward in that respect.

But I'm certainly willing to hear about ways to improve UPARSE's design. Though the biggest question I've had since the beginning is whether there's any way to do streaming parse...and if we should be operating on a model where the parsers "put back" input they look at but don't consume:

Parsing Giant Streams Without Consuming Tons of Memory: How?