Multiply Linked Lists

rgchris · December 24, 2020, 6:01am

Since I'm having a spitballing vibe, here's another proposal for a NODE! datatype.

In developing my Markup Parser, I settled on a method of tree-building in which I used linked lists with nodes having five relationships: parent, previous/next sibling and first/last child. For this, I used blocks with bi-directional links for each relationship.

Preamble

<html>
  <head>
  <body>
</html>

In this example (ignoring whitespace/text), there are three elements (I'll call nodes). I'll use blocks to represent each node:

; prototype
node: [parent _ back _ next _ first _ last _ name _]

; nodes
html: copy node
head: copy node
body: copy node

; names
html/name: <html>
head/name: <head>
body/name: <body>

; relationships
html/first: body/back: head
html/last: head/next: body
head/parent: body/parent: html

The advantage to this is in being able to reference nodes without losing that reference if other nodes are added/removed post-facto—quite common even when building a tree from HTML tokens:

; remove <head>
html/first: body
body/back: _
head/parent: head/back: head/next: _

; add <foot>
foot: copy node
foot/name: <foot>
head/last: body/next: foot

; body is still <body>
body = <body>  ; => true

Un-Rebol-like

With the myriad circular references, this becomes unlike any other Rebol data structure—it's difficult to get a helicopter view of the relationships on account of the number of circular references. Then the other issues inherent in Linked Lists, such as having to walk the list to get the size, etc.

Inefficient?

In the current BLOCK! based system, it's not clear how efficient all these circular references are or even just the cost of using names in the name-value pairings over potentially hundreds/thousands/more nodes.

Proposal

I'd like to float the idea of a NODE! datatype, which in memory is a six value-slot object with parent/back/next/first/last and value slots. It's only purpose would be to optimise for this case. With it would come a few efficiencies.

Management could come in the form of a small object with helper functions that managed the relationships:

; creation
html: make node! <html>
head: tree/append html <head>
body: tree/append html <body>

; reflection
value of html/last  ; => <body>
value of head/parent  ; => <html>

; manipulation
tree/insert-after <body> <foot>
tree/remove head

; reflection ii
value of body  ; => <body>
body/back  ; => _
html/first  ; => <body>
html/last  ; => <foot>
value of head  ; => <head>  ; still exists
head/parent  ; => _
head/next  ; => _

Coupled with my Class proposal, this would (I'd suppose) represent the lightest weight, yet versatile, representation of trees.

html: make html-element <html>
head: tree/append html make html-element <head>
body: tree/insert-after head make html-element <body>

Display

A display of the node would use shorthand for the value of each relationship in place of Rebol's usual [...] circular replacement.

>> probe body
== make node! [
    parent: <html>
    back: _
    next: <foot>
    first: _
    last: _
    value: <body>
]

Thoughts

I don't think it'd get around the inconvenience or inefficiency of, say, counting the number of direct kids, but the other benefits would make up for it.

NODE! could similarly be used in conventional linked lists or doubly linked lists by just leaving the other four/three relationships respectively blank.

A correction: not every relationship is bi-directional. Every node (except the root) connects to its parent, the parent only connects to the first and last of its kids

hostilefork · December 24, 2020, 7:20am

An easy interface with the browser DOM is very desirable. The best place to start experimenting with that is inside the browser...where the actual DOM is relatively near-at-hand.

(To reference a DOM object right now, you have to go through a mapping table, e.g. the JavaScript keeps a table mapping integers to DOM objects...and you hold references to those integers, always going back to through that table and asking JS to do things for you with it. That sucks, but they are working on it with something called Reference types. It's a burden to bear, but at least it's predictable...and if you abstract it then only the person writing the abstraction worries about it.)

If a good syntax/dialect and usability is set up and works in the browser, then we might wish to repeat that in code that can run when a browser is not available. While the GOB! code could maybe be adapted to the purpose, we could wire up a vetted virtual DOM library as an extension. (The discipline of using C for the core is still important, but C++ can and should be used in extensions for expedience, and any virtual DOM would likely be written in C++.)

We should be careful of trying to design generic data structures which have only one most likely application (e.g. DOM representation), wind up with a lot of C code, and then find it doesn't even do that one thing particularly well.

Since you're now doing some hacking on the JS build, you might want to take a step back from the idea of the low-level mimicry of JS methods and instead think about how higher-level Rebolisms might make life easier at a dialect level, and have it spew out a little stream of JavaScript for now.

To keep that focus, I might suggest you start by equating "node" to HTML node and let any grander designs emerge out of that.

We now have $ as an operator (I've tinkered with it being a variadic form of SHELL, e.g. $ ls -alF...but that was just one use.) We should be able to rock much harder than jQuery can with that through dialecting.

>> main: $ <html>
; Single HTML node, equivalent to jQuery $("<html></html>")

>> div: $ [<div> #some-id .some.class.(var-holding-class) [<span> "Children"]]
; Build from a template

>> div: $ compose [<div> (sub: $ [<span> ...])]
; Perhaps use COMPOSE to get handles to content

>> div: $ [<div> (sub: $ [<span> ...])]
; ...though COMPOSE implicit is likely best, what else would GROUP! be for?

>> $ 'query/ref [...]
; I don't remember much jQuery but you could have methods.  This would be
; better with my "action/object" duality proposal, where $.query/ref can call
; a QUERY function and $/ref calls a different "base" function when not
; subindexed with `.`... but if the function only takes one argument then a
; skippable LIT-WORD! or LIT-PATH! could be a hack for triggering that.

>> $ :[foo bar]
; What might that mean?  I dunno, point is it's dialecting and you'd have
; options for all the types.  Maybe BLOCK! is always a "build" request and
; `$ [<div>]` builds, while `$ <div>` would query and give you back all the
; divs as a collection?

jQuery has an issue that it has to deal with "un-augmented nodes" and then there are some routines that take "augmented nodes". They'd have names like $elem for the augmented form and elem for the low-level one. But since we're in another language, the augmented nodes would be all we can use. If you ever use one of these in a call to JS-DO then you'd necessarily want to do the extraction from the map to get the actual element.

It would be important to extract that element for use in JS-EVAL and JS-DO. So something like my dialecting to extract API handles would be needed:

github.com/hostilefork/replpad-js

Dialected JS-EVAL and JS-DO for embedding Rebol Values

hostilefork:master ← hostilefork:js-eval-dialect

opened 08:38AM - 14 Apr 19 UTC

hostilefork

+106 -8

This is a fun usage of the new @-types, which lets you refer to Rebol values mo…re easily in JS-DO and JS-EVAL expressions. By doing so you not only save on typing, but it finesses what would be hairy escaping situations if your only currency was text: >> text: { ykcirT" a s'tI} >> js-eval [{reb.Spell("reverse",} @text {) + "String" + '"'}] == {It's a "Tricky String"} So what's going on there is it's generating a temporary API handle to refer to the value fetched from TEXT. This API handle is then used literally as one of the arguments to the variadic/evaluating extraction routines in the API (mixed here with some JavaScript `+` string concatenation). The variadic knows to free this handle after it is done processing it, (so it is like a reb.RELEASING() or reb.R() handle). This is more powerful than just treating a block as running JS-EVAL UNSPACED [...]. But if you want that, you can still use a plain GROUP!: >> text: "gnirob" >> js-eval [{"} (reverse text) { group"}] == "boring group" You can use @path/values, @(computed values). Although INTEGER! Rebol values aren't the best example (since you could pass them fine via their string representations), they'll work too *(just easier to demo in a PR than Blobs for BINARY! and such...)* >> x: 1000 >> y: 20 >> js-eval [{reb.Unbox(} @(x + y) {)}] == 1020 And to make things even more succinct, WORD!s in this dialect can be used to generate the reb.XXX() calls for you: >> x: 1000 >> js-eval [{20 +} unbox @x] == 1020 Currently there's support for UNBOX and SPELL, but this could expand. (Note: JS-DO and JS-EVAL are similar, except JS-EVAL tries to return a result to Rebol translated from the JavaScript it is given...and cannot declare variables in global scope. See technical notes in the implementation if interested.)

So unboxing one of these $ [<div> ...] DOM elements needs to be able to easily get that element through to JS.

BlackATTR · December 24, 2020, 2:19pm

I'd wager this is relevant in many more cases than the DOM. Speaking with my file-munging hat on, what Chris described applies to many forms of structured data. While XML and JSON often encode basic lists of data, it's also common to see taxonomies, ontologies, graphs etc. where relationships between nodes is relevant/important.

Although would probably be a substantial undertaking to implement a feature like this, I thought I'd back-up the idea. Being able to deftly handle relationships and interconnectedness of data is a topic worthy of discussion.

Btw, the jQuery example looks great.

rgchris · December 24, 2020, 6:47pm

To be clear, this wouldn't be to interface with the DOM, this would be to build DOM and other structures internally (not only for use within the browser).

It's only one (or three) application(s), but it's a bit of a doozy. It's a pattern that is imo. worth optimizing for the sheer breadth of that application, though I wouldn't hazard to say how much of what I propose optimizes over using the aforementioned block structure I currently use.

I am eternally naive, but I wouldn't see it as being a lot of code. There's not a lot of logic to it and I wouldn't propose that of itself it'd be all that interoperable with other datatypes (not much by way of TO anything). My current Rebol implementation is about a hundred or so explicit lines of code. What I'd envisage is just cutting out the BLOCK! middleman.

This proposal is unrelated to any work with JS, this has been on my mind since working on the Markup parser. It just happens to dovetail with the Class idea as two significant efficiencies on creating and manipulating such structures.

rgchris · December 24, 2020, 6:59pm

Yes!

I don't know that this is necessarily true, although again, I may be naive here. The tree management code I have is not significant in size but is versatile—it's just that it can slow things down when operating at scale. My main proposition here is creating the most efficient container and basic tools.

If you look at the TREES object in Markup.reb, there is enough management here in 150 LOC to make the implementation of the Adoption Agency algorithm [spec] reasonably literate.

rgchris · March 10, 2021, 4:49pm

NODE! type proposal for Red

hostilefork · July 29, 2021, 7:33am

I have been having some half-formed thoughts (okay, much less than half-formed, maybe 2% formed) about how we might start thinking about taking ideas from Rust in terms of ownership structures. This would mean differentiating blocks that are owned in one place from those that need to be GC'd due to references permitted in multiple places.

Maybe then BLOCK! could fuse with what people want from NODE!. You'd just not be able to ask what the parent of certain blocks were because it would be ambiguous. Perhaps OBJECT! would have this too. Or maybe INTEGER! and everything else, which is starting to sound very graph-databasey.

The Rust thoughts come from focusing on how to get constructors/destructors that can blend in the model, and knowing when you've crossed the line into "nope, no destructors". This twist of complexity may be a more useful tax to pay systemically.

Regardless, we can't lose sight of the goal...which is to preserve a particular kind of "coding freedom" or "fun". So the non-negotiables have to work.

I really want to get back to that list of non-negotiables whenever we can. For someone to write a line of code and say "The project is dead to me if I can't write this line this way." is a really helpful guideline.