PARSE on PORT! and avoiding generic behavior w.r.t. ANY-SERIES!

If you look at Rebol2, R3-Alpha, and Red they all do the same thing with FILE! that they would with text (STRING!) when you pass as the first argument to PARSE:

rebol2>> parse %aaa.txt [some "a" ".txt"]
== true

r3-alpha>> parse %aaa.txt [some "a" ".txt"]
== true

red>> parse %aaa.txt [some "a" ".txt"]
== true

I've always had the ambition that you be able to PARSE a PORT!. If that's possible, it seems that you should be able to shortcut actually opening and closing the port yourself by saying something like:

parse %some-200-megabyte-file.txt [
     some "a" end (print "Your giant file was all the letter A")
]

parse http://example.com/some-net-data/ [
    thru <title> copy title to </title> (print ["Title was" title])
]

The vision would be that PARSE would assume when you gave it a FILE! or URL! that you meant to operate on that as a PORT!...opening it, parsing it, and closing it. If you gave it a regular PORT! it would assume you would take care of closing it yourself.

Furthermore, it would be efficient so that it didn't need to load all of it into memory at once. (There could be some heuristic on a "chunk size" it picked automatically, paging in only as much of the file as it needed at a time. But you could perhaps tweak that manually by opening the port yourself and doing some settings. This seems to be a property of the PORT! and not of PARSE, though there may be PARSE-specific settings. Perhaps those settings would be looked for on the port itself as an extensible set of headers, vs. being some strange refinement you'd pass.)

In any case, the appeal of having that work for FILE! and URL! certainly seems to suggest that it's a much better use of the type variety than as a synonym for:

>> did parse as text! %aaa.txt [some "a" ".txt" end]
== #[true]

There's clear need for PARSE to run on TEXT!, BINARY!, and BLOCK! input. I'm not sure how this applies to INTO. There also might be a parse/only (or parse/into? which would be type-preserving?)

Not just for PARSE: a General Philosophy of ANY-SERIES!

This ties into what I think should be a very restrained tendency to use ANY-STRING! types in ways that make them equivalent to the behavior on TEXT!.

I've said similar things about why type of first ['''a] should not be conflated with plain WORD!. There should be a default of discernment; leaving the room open for distinct meanings.

So be on the lookout for cases where a datatype is being underused, even if it's not able to do the ideal magic today. Seeing PARSE run on PORT! is a pretty big wishlist item for me, so maybe it's not impossible that it could happen... (!)

2 Likes

But we still need to modify URL!s and FILE!s et cetera without modifying the contents they would point to if they were opened as ports. In other words, I would be against being forced to write myfilename: as file! append as text! %abc as text! %.txt in order to create a new FILE! value by appending one old one to another. There's a slippery slope when you treat a string as if its value is the string contents returned by some routine that interprets it.

In my view the need for parsing a PORT! (which I agree is a genuine need) should be met by a routine designed for it, something like say PARSE-CONTENTS, and refinements to it are where the chunk-size parameters and such-like belong, not cluttering up the PORT! object itself. What if there are other ways of operating on PORT! contents besides parse? Do we add their parameters to the PORT! object? No.

In fact I even disapprove of DO of a TAG! acting like DO of the contents of the (file? url? which one?) whose name is the string contents of the tag (and whose directory or base url is where exactly?). It should be DO-CONTENTS, with (exclusive, and possibly in some fashion semi-permanent, needs more thought) BASE and DIR refinements. But at least in that case there is no prior functionality that is being usurped, so I have kept my big mouth shut about it until now.

If things like to file! :[base %.txt] "just worked" like you had written as file! unspaced [base %.txt], we might look at things differently. Life wouldn't have to be as miserable as the worst case you give. So we should be a bit circumspect.

It seems desirable to want append %my-file.txt {Some text} to write to the file instead of give you %"my-file.txtSome text". I've pointed out before that PORT! itself faces ambiguity in its dual-life as an ANY-CONTEXT!...and APPEND is a good example. APPEND to an OBJECT! would add fields in SET-WORD! and data pairs... while APPEND to a PORT! re-triggered as WRITE/APPEND. Which behaviors qualify for subversion of the underlying type, and which do not?

I don't know what the full answer is...I just don't want to leave the most interesting behaviors off the table to preserve a kind of trivial mechanical consistency. Though I am a huge fan of mechanical consistency--it's just a matter of where you establish that firm ground.

It does make sense to me that if I'm going to make successive calls to PARSE on a PORT! that the persistence of those settings be something on the port if they are not applicable to a parse of a string in memory.

I would offer http headers as an example of a sort of labeling protocol ecology (in fact it's an ecology that PORT! might do well to inherit from or be compatible with).

I've already suggested that if you say read myport://something that it run a generic process regardless of port type spiritually akin to:

p: open myport://something
append p/headers [num-reads-to-come: 1]
data: read p
close p
data

How many reads are to come might default to unknown in order to inform a keepalive on a TCP connection, and other ports might ignore it altogether. But it would be there to draw from.

It's not just network connections that could have this kind of meta information, and I see no reason why it couldn't be there to draw from for specific clients like PARSE.

2 Likes

This sounds pretty amazing to me, esp the part about managing files larger than available memory.

You are right, that this feature is somewhat "special", and it would be better, if it were easier to verfy the lookup table. And at the same time I actually like this shortcut, and it would be great if you could add your own personal links to it.