Could strings have context?

rgchris · March 31, 2018, 1:40pm

A common problem faced within Rebol is passing string-based templates to functions without any context attached to them. What would it take/cost to add context to string values? An example:

my-context: make object! [
    x: 10
    template: "x"
]

reduce load my-context/template

The thought would be a string in source would adopt its parent context, otherwise would be unbound.

hostilefork · August 6, 2018, 12:42am

The CSCAPE templating function used in generating C code files during bootstrap is a very pertinent example of this problem, e.g. here:

make-inline-proxy: func [
    return: [text!]
    internal [text!]
][
    cscape/with {
        $<OPT-NORETURN>
        inline static $<Returns> $<Name>_inline($<Wrapper-Params>) {
            $<Opt-Va-Start>
            $<opt-return> $<Internal>($<Proxied-Args>);
            $<OPT-DEAD-END>
        }
    } reduce [api 'internal]
]

Part of why it works in bootstrap without having to supply more context is that the boot process really uses the user context as the place where most stuff is put. It's somewhat ad-hoc. If it were broken down better with more locals, then CSCAPE wouldn't know where to look.

One thing Ren-C can do that historical Rebol and Red can't is to be able to locate function parameters and locals from the binding of any argument or local (or the literal FRAME! value itself). It's not a solution, but it's better than in R3-Alpha where if you asked for the binding of a function local or arg you just got "true", and there was no ANY-CONTEXT! to find your locals from.

No real solutions off the top of my head, just agreeing that this is something that needs to be thought more about.

hostilefork · December 26, 2018, 1:50pm

In Rebol's current model, the only viable way to do this is to pair up the bindings you want with the string, something like:

template: [(x y z) "$x and $y and $z"]

Definitional binding moves in a wave, and you only get the chance to make that binding at that moment in time...after which the entity describing the binding environment no longer exists.

Some alternative model might allow the string to capture a pointer to an abstract entity which represents the memory of that binding environment--so you could ask it to look up x and y and z based on that pointer.

This is similar in a way to the aspirations of virtual binding. But virtual binding is intended to act as a light surrogate for adding only a few bindings to code (and maybe flattening them out into a copy if the lookups seem to be happening too often to virtualize). It seems on the surface that trying to recreate the entire binding environment of a string in a reified object could be prohibitive.

But who knows, there may be magic along the lines of persistent vector which could cull the total number of binding environments to something manageable. These things are big unknowable research problems in their own right. It's hard to say what can be done if it hasn't been invented!

I think it's important to keep an open mind and consider the idea that much of the current binding mechanics might have to be thrown out. I'd even be willing to consider linking to another engine (Haskell, Clojure, Graphd, etc.) and delegating the task of binding and management to something within their methodology--for prototyping purposes. Then once we see what might be done abstractly, we could think about how a from-scratch low-level C solution might match it.

The trick is to come up with new superpowers without breaking old behaviors that were are important. What we get from today's Rebol is a minimum baseline of expectation of the system's abilities.

hostilefork · October 18, 2020, 6:05am

There is a slot in strings that I do not think will get used if it is not used for binding. That means the cost for such a feature storage-wise could be low. We can say that BINDING is a common property that all ANY-SERIES! have, and make use of the space.

But it raises many of the same questions that come up with whether a function inside an object should just get "tagged" with the binding of whatever is running MAKE OBJECT!. Does this apply only to "source-level" strings, or if you access through a variable does it count too? e.g. what if it said template: other-string ? If the bindings are useful, then overwriting them arbitrarily would seemingly make them less useful.

This makes me wonder a bit if METHOD could be a more general tool. Maybe it's a better use for MY than its current "me-like" operation? (I'll take MY-CONTEXT off the context name of the example...)

some-context: make object! [
    x: 10
    template: my "x"
]

So MY would be the tool for grabbing a context out of the left SET-WORD! (or SET-PATH!?...is that possible?) and then slapping the binding onto the thing being assigned into it.

This would mean METHOD could be done as MY FUNC...though that would only give you the binding, it wouldn't get the implicit <in> for inherited context variables. :-/

Anyway...point being: I think allowing you to BIND strings to contexts is an interesting idea, but it's another case where trying to do such bindings automatically is questionable and could undermine the purpose.

Some alternative model might allow the string to capture a pointer to an abstract entity which represents the memory of that binding environment--so you could ask it to look up x and y and z based on that pointer.

This is similar in a way to the aspirations of virtual binding.

I still think there's some great reckoning in binding that needs to happen. The bad news is that I haven't had any great eurekas about it in a while. The good news is that if I do, the code remains a solid testbed for trying any ideas that can be articulated.

IngoHohmann · October 18, 2020, 7:14am

The idea is interesting, and I'm with you that binding should have to be explicit.

hostilefork · January 15, 2021, 5:22am

Now's a good time to be looking at this question.

So with virtual binding, I think we're going to want to programmatically expose the virtual bind chain somehow or another.

>> a-obj: make object! [a: 10]
>> b-obj: make object! [b: 20]
>> block: [a b]

>> viewed: use a-obj (use b-obj block)

>> binding of viewed-with-c  ; new idea: asking BINDING OF on a BLOCK!
== [#[object! [b: 20] #[object! [a: 10]]  ; let's say list of 2 objects

It's technically possible to tie strings into the same basically giving it the knowledge that a block has:

use [x] [
    x: 10
    s: "some-string (x)"  ; *could* pick up `x` awareness automatically
]

But it doesn't have the natural dampening factor that regular virtual binding has, that whenever you copy something it resolves the binding at that time..dropping the chain. So string bindings would either need to be dropped automatically (and unpredictably) or the virtual chains would just grow indefinitely.

The indefinite growth is especially bad considering it'd be a feature you'd use relatively rarely.

But binding strings explicitly might be okay:

>> obj: make object! [x: 10 y: 20]
>> viewed-str: use obj "The (x) and the (y)"

>> binding of viewed-str
== [#[object! [x: 10 y: 20]]

I think it's misleading to do this kind of operation with BIND if BIND is presumed to be mutating. Because it's not like you'd be giving this binding to all instances of the string...just the result would have the "view". So it's "virtual"...you must save the result to use it.

e.g. this would be meaningless:

>> bind str obj  ; no result saved would mean it did nothing

This makes me feel like BIND on WORD! is misleading... and maybe we should go with use obj word instead of bind word obj.

Anyway, I might work up a test on this strings-having-context concept here in a bit.

hostilefork · May 15, 2021, 7:50pm

So the feature this ties into in other languages is called "String Interpolation" (Wikipedia).

I've started to feel that supporting string interpolation is fairly important.

This means that a string would have to carry a capture of its binding environment, somehow.

I managed to work up a small test using Sea of Words + Virtual Binding features, which showed some promise in being able to have a function take a TEXT! as a parameter and be able to look up variables in the attached binding of the text.

The capture of that environment might be something explicit done by the function receiving the TEXT! value. But it wouldn't be something the user of the interpolator had to do anything special in order to take advantage of.

As binding is rethought and considered, the important thing is not to get hung up in the historical mechanics. Instead, the question is to ask about the user experience...what can you and can't you do--what works and what does not. One of the positive aspects of having tons of existing code is that if any new feature breaks something that used to work, I find out about it quickly.

Anyway, string interpolation is on the radar as one of the "things we want".

hostilefork · January 16, 2024, 9:22am

A post was split to a new topic: Red's Take on String Interpolation

hostilefork · September 27, 2021, 7:39am

It's A Bit Too Early To Declare Victory... BUT...

...Prepare To Get Excited!!!

I have a system booting...that can run UPARSE and do HTTPS requests (so it's non-trivially booting).

...AND it can do this:

internals: func [a <local> b] [
    b: "internal-B"
    let c: "internal-C"
    print interpolate "$(a) $(b) $(c)"
]

>> internals "argument-A"
argument-A internal-B internal-C

It can also do this:

externals: func [str a <local> b] [
    b: "internal-B"
    let c: "internal-C"
    print interpolate str
]

>> a: "global-A"
>> b: "global-B"
>> c: "global-C"

>> externals "$(a) $(b) $(c)" "argument-A"
global-A global-B global-C

This demonstrates the requested feature...for strings to capture a kind of "binding environment" and carry it along with them (much like traditional WORD!s would have a binding that would "stick" to them).

The INTERNALS function is able to soak up context onto the string inside of a function
The EXTERNALS gets context that's not interfered with by the local fields in the function.

The INTERPOLATE is Mostly Usermode

There's a little bit of UPARSE code to break up the string:

breaker: func [return: [block!] text [text!]] [
    let capturing
    let inner
    return uparse text [collect [while [
        not <end>
        (capturing: false)
        keep opt between <here> ["$(" (capturing: true) | <end>]
        :(if capturing '[
            inner: between <here> ")"
            keep (as word! inner)
        ])
    ]]]
]

It gives you a block of WORD!s and TEXT! bits:

>> breaker "abc$(def)ghi"
== ["abc" def "ghi"]

Then the INTERPOLATE function relies on a new weird native called GET-IN-STRING:

interpolate: lambda [text [text!]] [
    unspaced map-each item (breaker text) [
        if text? item [
            item
        ] else [
            get-in-string text (ensure word! item)
        ]
    ]
]

GET-IN-STRING takes a WORD! and an ANY-STRING! and it will look in the string--as if it were a context of some kind.

(I could have made IN accept TEXT! as a context, so you could write get in text item, but this is all very speculative so I kept it separate. But it would presumably become something like that.)

The Implications Are Pretty Profound

What's kind of astonishing about the above is how a powerful feature like string interpolation is being constructed in userspace. Very few languages put you on the same level as the language designers, to add new features of this type.

You can imagine powerful variations like what CSCAPE does. If you use $() then it assumes you want to repeat the line several times, with the last repetition not repeating whatever comes after it (good for comma lists):

block: [one two three]

cscape {
    enum {
        $(block),
    };
}
== {
    enum {
        one,
        two,
        three
   };
}

You can thank CSCAPE for why I've not been willing to compromise on going after this feature.

Should REWORD Be Our "INTERPOLATE"?

I like REWORD as a shorter name than INTERPOLATE. (Most languages don't name it explicitly...because interpolation is a built in feature of strings...or they call it format() or fmt(). The humorous language LOLCODE calls it "SMOOSH" )

REWORD has a historical quirk that it doesn't require terminating the substitutions. It's only able to do this because you've given it the explicit list of the substitutions you're interested in:

>> reword "$abcdef" [abc "123"]
== "123def"  ; knew you weren't looking for $abcd, $abcde, $abcdef...

I don't know how interesting that "feature" is. :-/ But beyond having it enclosed in delimiters, it seems having the option to put spaces around things is nice to have by default:

>> abc: 123

>> reword "foo$(abc)bar"
== "foo123bar"

>> reword "foo $abc bar"
== "foo 123 bar"

If you want to parameterize the REWORD, you would thus do it by means of manipulating the binding of the string:

>> abc: 123
>> def: 456

>> reword (bind "foo $abc bar $def" [def: 789])
== "foo 123 bar 789"

...or at least that's one idea. There are plenty of details to work out.

So How Real Is The Implementation?

~~Fairly~~ ~~Kind of~~ sort of real. It takes some liberties, with the hopes that those things can be pinned down better as things go on.

Please bear in mind that this didn't come out of nowhere. I didn't just write this in a week. The foundations that make this possible have been evolving and trying to form a richer basis for binding... ultimately what I have called "virtual binding".

But after many days of hacking through getting a booting system under fairly new binding rules...I tried the INTERPOLATE immediately. Because I was tired of filtering through crashes.

So that means that now--with the proof of concept going--there are thousands of tests to filter through to see what all breaks.

Performance isn't bad enough to be unusable. It seems all right. I don't want to look too much into it until things are further.

I'll know more in a bit. But it's very promising...and there will almost certainly be other features that can be built from the new foundations.

BlackATTR · September 27, 2021, 1:09pm

This is just... just... fantastic work!!
I can't wait to see what can be done with this for all kinds of templating approaches.

Brett · September 29, 2021, 4:57am

Looks great! Seems like these new underpinnings wills support some creative abilities beyond powerful string interpolation.

hostilefork · January 16, 2024, 9:31am

Hmm, well, now that we've gone a little further in the virtual binding design... how about this idea...

What if the currency of string interpolation is just a string in a BLOCK! ?.

internals: func [a <local> b] [
    b: "internal-B"
    let c: "internal-C"
    print interpolate ["$(a) $(b) $(c)"]  ; string wrapped in BLOCK!
]

>> internals "argument-A"
argument-A internal-B internal-C

Block evaluation captures the "current" evaluation environment. Then interpolate can ask that block to do lookups, as if it were a context (get in block 'a)

It's a couple of extra characters. But this would mean you wouldn't have to fret about using/exchanging string literals and having them carry the burden of environments. A lot better than starting to worry about having to quote your strings to suppress binding!

Because typechecking is done with predicates, we can typecheck "string-in-block" now, so that makes things a little nicer than saying the interpolate function just takes a BLOCK!. Interpolatable strings can be their own datatype without having to come up with a new DATATYPE!

Moreover, I imagine it's not unusual to want to have more information in interpolation scenarios than just the string anyway, so having a block might just come for free in a lot of cases where it's part of a dialect.

(Blocks capturing environments still have the potential to create a lot of waste, and that needs to be addressed. But at least this pares down the concern a bit...)