Kinds of values?

Brett · January 14, 2018, 12:58am

Rebol series and values lack a built in way to add meta data.

For example, imagine two blocks each in a different dialect. Without interpreting their content, how can they be distinguished?

Another example is parsing a foreign format into a sequence of strings but not having a way to distinguish the different kinds of strings. One wants the linear sequence, without having to try to shoe-horn these strings into standard rebol types nor maintain and synchronise some separate attribute system.

For me, these examples point to a missing feature. While suffering some jetlagged induced insomnia I thought of the following idea.

Values could have a user defined property of KIND or CLASS, I'll call it Kind here. Mindful of the above examples, I thought series could have an attribute of Kind, this would be useful for strings and blocks. They would have supporting syntax.

For demonstration I thought Kind could be a word followed by a #, followed by the existing syntax. E.g:

vid#[button "push me"] ; A block of VID dialect.
vector#[1 2 3]
kind of markdown#{*This* could be interesting.} ; Would yield MARKDOWN

Kind would be optional so existing forms would be unaffected:

kind of {simple string} ; would return _
kind of [simple block] ; would return _

An obvious concern is that it would uglify Rebol syntax. I have no answer to that, I'm just at the moment excited by what benefits could be derived.

In a sense Kind is just tagging values and would be orthogonal to the rebol types. But it could allow functions to check the Kind of their arguments, and possibly have some operations choosen by Kind. As such it may be answer to user types - if you want a custom type, define a dialect to represent the type (which can identified by Kind). No need for some developer designed custom syntax.

Matching by Kind in PARSE would be really really useful.

PARSE could have a mode to emit matched sequences with a Kind that comes from rules invoked by words. Giving a basic parse tree output for little effort - highly useful for data extraction.

Pathing by kind could perhaps enable some sophisticated extraction from recursive structured formats (XML?).

Presumably other ideas would be forthcoming if Kind was available.

Kind could probably be applicable to all rebol types:

column#3 ; An integer whose kind is Column.

I don't know if this idea is good or terrible, but it is interesting at least. Wondering what people think and if there are better ideas?

asampal · January 14, 2018, 3:19am

I also like the idea of metadata, @Brett, and I'm pretty sure I suggested this to @hostilefork some time ago. I was thinking of a mechanism similar to Clojure's for adding metadata to collections and symbols. That seems pretty flexible to me, somewhat more so than what you described above, IMO, but even your idea would be better than not having the possibility to annotate things at all.

hostilefork · January 14, 2018, 4:47am

I'd file this under the generic missing feature of a user-defined type system. So it wouldn't be that primitives (like blocks) would be packing in additional properties. There's simply not enough room, without changing the rules of the cells and REBSER nodes in ways that would expand and complicate them. Mechanically they would be similar to objects, but have extension/usermode code giving them behaviors.

Already I see this being the right direction to take things like IMAGE!. Instead of trying to pack data into Rebol's fundamental REBSER nodes (like the size in some extended bits) that it would be composed of a PAIR! and a BINARY!. But perhaps have a form that could serialize in the way you describe.

Note you can currently add "meta" information to FUNCTION!s or ANY-CONTEXT!s (objects, frames, errors). This is just an ANY-CONTEXT!. There is no rendering of it, and we do have questions about round-tripping objects as-is when they are molded...but to get the behaviors you describe we're probably talking about a syntax by which this meta information would serialize itself. I see these features as an outgrowth of that mechanism.

When I've spoken of this I had been calling what we say is "type" today is KIND, and what you are calling "class" I would call TYPE. Something like vid#[button "push me"] would report back its KIND as OBJECT!, its TYPE as VID!...and the object wrapped only a BLOCK! it would have to forward requests to the block to answer all the operations like PICK etc.

I hadn't thought of being able to offer the illusion that it could report itself as a BLOCK! but add on some extra information. Perhaps I'm a little too much thinking-in-the-box of the implementation, because I know that tagging blocks and strings that way would create costs which "break the rules".

Rebol's attempt to standardize a format for information interchange--so that it could be a reasonable exchange medium like JSON--is not very compatible with things like user-defined molding methods. You wind up with things that put themselves out in a way that may be readable or unreadable based on what user defined types are loaded. Or where different versions of the extension can create incompatible serializations.

I think this is going to be the hard balance to strike with user-defined types...searching for some syntactic sweet spot. But maybe it means there's a methodized form of LOAD and an unmethodized form, and you have to pick which to use based on what you're aiming for.

Brett · January 16, 2018, 1:31am

Ok, so the idea appears to be impractical implementation-wise.

Hopefully such a system will be lightweight enough that it can represent the tokenisation of foreign syntaxes such as C for example

Breaking homoiconicity is one. I find this annoying enough to avoid using objects in my data structures where I can. Block structures without objects are nice for saving out as text intermediate data steps, visualising and modifying them in editors, etc. I like my data to speak to me, not be cluttered with noise.

I'm unsure about the question of syntax...I did propose new syntax, which we wouldn't want to happen much - perhaps it would be good to have a survey of what are the possible syntax forms that might be considered primitive.

Many important points raised here, but perhaps the discussion is too early, so I'll not take further time away from other progress by going into further discussion right now.

hostilefork · January 26, 2018, 5:19pm

Some thinking from a wiki submitted by a Red user, on this theme...just bookmarking it:

UDT (User-defined types)

hostilefork · August 24, 2024, 5:10pm

A lot changes in 7 years...!

With THE-TUPLE! and THE-PATH!, you can make values like these:

>> @vid.[button "push me"]
== @vid.[button "push me"]

>> @{vector}.[1 2 3]  ; when we get FENCE!
== @{vector}.[1 2 3]

>> @markdown/"*This* could be interesting."
== @markdown/"*This* could be interesting."

The @ suppresses evaluation, meaning you could pass them around without generating evaluation errors... and whatever you passed it to could read the type information. And we now know that @ carries binding, which is important here.

Who exactly pays attention to this and what it means, I don't know. This could be something DO would accept... e.g. it assumes if you DO a THE-TUPLE! (or THE-PATH!) then it should consider it dialected. (Maybe it's better to have it be DO-DIALECT to make it clearer?)

But the mechanics are in place to experiment with!