Semantics of PORT!s vs. Streams vs. Iterators

hostilefork · August 27, 2021, 7:50am

I've complained often about PORT! seeming to try and serve two masters... it tries to act as something of an OBJECT!, but also a stream.

Here is the definition of system.standard.port...the template object from which PORT!s are created:

system.standard.port: make object! [
    spec: '
    scheme: '
    actor: '
    awake: ~unset~
    state: '
    data: '
    locals: '

    connections: '  ; currently used only by TCP ports
]

That's an OBJECT!, but the underlying datatype is switched to PORT! when the port is created.

How Do You Interact With This Object as an Object?

The "What is a Port" document says this:

Specific action functions can be applied to a port. Some common actions are:

make - create a new port

open - initialize the port

close - finalize the port

read - read data from port

write - write data to port

query - get other information from port

update - detect external changes to the port

But, there are many other actions as well, as generally defined by Rebol datatypes.

What else qualifies as these "many other actions?"

Rebol2's ODBC suggests PICK (FIRST), INSERT, and COPY as choices.
Source code for File Port in R3-Alpha shows APPEND, DELETE, RENAME, MODIFY, OPEN?, LENGTH?, HEAD, TAIL, NEXT, BACK, SKIP, HEAD?, TAIL?, PAST?, CLEAR.

This is awfully saturated, and it seems nothing is off the table for what this abstract idea of a PORT! might want to react to.

So how do you get at these object fields safely? How do you PICK the SPEC field? How do you POKE the AWAKE function?

Difference Between Ports and User Defined Datatypes?

It doesn't really seem like anything is off the table for what you can override on a PORT!.

Can you define what it means to ADD to a PORT!? If not, why not?

What about path selection on a PORT! (or today's field access via TUPLE!, another option).

Once you get into this "anything goes" attitude you are essentially talking about an implementation for user-defined datatypes.

This points out an important aspect of the articulation of any design: You need to be able to say what it isn't, or there's no meaningful definition of what it is.

Streaming/Iterating Is Narrower And Needs a Protocol

While PORT! is slippery, I'm going to be attacking just the more basic questions of streaming and iteration.

We want to be able to say parse some-100-mb-file ["FOO" <stop>] and not have to read 100 megabytes just to know if it started with "FOO"

And we want to be able to do that with something like FOR-EACH as well...

 for-each [x y z] some-100-mb-file [
     all [x = #f, y = #o, z = #o] then [break]
 ]

That shouldn't need to have all 100 megabytes in memory. And a generic solution to this which puts iteration in the mix should allow for streams to be piped and connected to each other to do filtering, encryption/decryption, compression, etc...

So I'm going to focus on the narrower question of how to do that, vs. the muddle of "What is a PORT!", at this time. Wishful thinking isn't going to solve that problem, but rational adaptation of the methods used by other languages that do this might.

IngoHohmann · August 27, 2021, 9:16am

I never really liked ports to begin with. I'd more likely create my own object to encapsulate the port and use my own functions to interact with it.

Brett · August 27, 2021, 1:03pm

I'll add one extra perspective around the semantics of ports from a user pov, at least for Rebol 2.

Way back when, I got interested in ports because I thought Rebol interpreters were going to talk dialects to each other via ports. I was eventually disappointed. In my only documented port scheme do-pop, I thought a port was a better option than an object because it could cleanup after itself. Nevertheless, I side-stepped the clunky port interface and the cognitive burden of trying to translate the semantic action I wanted into port actions - they don't fit well anyway. Instead I used the port as an bind and evaluator of simple function calls, where the "evaluation" happened remotely. I don't know that it was a great idea, but it was interesting to me because it was like a "relative expression" was being evaluated over the wire transparently.

Wise.

rgchris · August 8, 2022, 5:15pm

Input/Output Ports

I'd add here that input/output may not be the same end point. In Rebol 2, as I understand it, in a console session both (SYSTEM/PORTS/) INPUT and OUTPUT were set to the same CONSOLE scheme, though when called from the shell, it uses a FILE scheme. Somewhere lost in the translation to Rebol 3 is in part why Rebol 3 was incomplete (and where Rebol 2 still has flaws):

My primary usage of Rebol is running scripts from the shell and such usage is constantly polluted with artifacts from implied terminal usage (the dreaded >> or the 5... 4... 3... 2... 1... bomb of R3C-era Ren-C.

Anyways, while it's somewhat intuitive that INPUT/OUTPUT are set to the same scheme, it doesn't necessarily have to be. You could set output to another open FILE port and all output is redirected there. I'm not entirely certain what the merits of any effective exploitation of this would be, but IDK.

What is a Port?

I don't know that this necessarily follows. I've argued that ports are as close as you get to user-defined datatypes, but I'd also suggest that this is a misapplication. I think that PORT exists (as the name implies) as an interface to an external system or construct. When viewed this way, you might also consider SERIES or MAP to be internal ports with a fixed and native implementation of the, let's call them, say, 'manipulation verbs' and while it can be desirable to have those verbs be consistent with custom ports, it doesn't necessarily have to be so. It depends on context.

I think this definition tracks with the use of URLs/FILEs as the initiator as well.

Iterators

Javascript handles iterators in much the same way Rebol handles ports: objects that conform to a set of conventions that integrate them with fixed language constructs (e.g. for...of). The result of this is that you can layer things in customisable ways. Rather than explain this in Javascript, I'll offer a prospective way for how this might look as Rebol.

Test Case

I have a little test case for where a piece of data is stored in a few layers of formats:

encoded: "start F3Dl7!! end"

This snippet contains the string 'Foo' where it is Deflated and then encoded as Ascii85. It might be possible just to copy text between 'start' and 'end' and apply the decoders sequentially, however "end" is valid Ascii85 and spaces are permitted, so the following may stymie this strategy:

encoded: "start F3DI7!! end end"

Ascii85 is not self-terminating, Deflate is. Thus a way to approach this might be to use an iterator for each level. Prospective solution:

encoded: make string/iterator [
    source: "start F3DI7!! end"
]

encoded/consume "start"

encoded-ascii85: make ascii85/iterator [
    source: encoded
]

encoded-deflate: make deflate/iterator [
    source: encoded-ascii85
]

result: make binary! collect [
    while [
        byte: encoded-deflate/next
    ][
        keep byte
    ]
]

encoded/consume " end"
=> true

Could just as easily begin:

encoded: make big-file/iterator [
    source: %file-containing-data
]

Like ports, this still doesn't give you consistency between formats—what does it mean for a JSON iterator when you say:

json-iterator: make json/iterator [
    source: "[1, 2, 3, 4]"
]

probe json-iterator/next

=> one of:
== [1 2 3 4]
== [open-array]  ; to be followed by [number 1]

I'm not exactly certain how one would go about making these iterators in an efficient fashion. Pulling byte by byte from a Deflate iterator might be a slow way of doing that.

hostilefork · August 8, 2022, 8:54pm

As a practical matter, streaming interfaces tend to have to pick arbitrary block sizes of some maximum of how much they read at a time.

My concept for READ was that if you did not qualify it, then it would mean "read everything":

Semantics of READ and TCP Streams: Past And Future

But then you have the question of reading a maximum of a certain number of bytes, vs. reading exactly a certain number of bytes.

A one-byte-at-a-time interface is a good place to start, but all practical streaming code has to enable bigger chunks.

I definitely think that non-trivial layered examples are important to consider.

I also think the design would be very influenced by generator and yielder:

YIELDER and GENERATOR (and thinking about Coroutines)

But I'm trying not to rush those, because the system really needs to stabilize and have things hammered down. There is a lot of "technical debt" that needs to be paid off...