Boron Language

Hello, Boron author here. I look in on the Ren-C project once a year or so to see what's going on. A few times I have considered joining this forum but for various reasons have not done so. Since you have made space for similar projects, now seems like a good time to touch base.

Boron is my daily driver for scripting and has been in a stable period for years. Bugfix releases occur roughly once per year. I should have been making release announcements on the mailing list, but as there is no community of active users I didn't bother. To exercise the language I have used it in the xu4 project to replace the XML configuration and scripting.

The next period of change may include support for static strings (ala AltScript), optimized path! storage, and reworking the evaluator to support yielding.

Games and graphics are an interest of mine so the Boron-GL project is where I experiment with GUIs, shaders, and such. Some of the work on xu4 such as font rendering and the Faun library will make it's way into Boron-GL. The GL code was part of the main Boron repository until the end of 2019.

3 Likes

Hi, thanks for speaking up...

I'd certainly be interested if you could post some code snippets where you feel it demonstrates the features of the language--any notable design deviations.

I remember Ultima IV. Played it on the C64. :slight_smile:

Ren-C uses the "UTF8-Everywhere" philosophy:

Realistically Migrating Rebol to "UTF8 Everywhere"

R3-Alpha had gone ahead with the idea of ISSUE! being read-only, so I just merged ISSUE! and CHAR! together as one immutable type. If the UTF-8 data for an ISSUECHAR! (tentative rename: TOKEN!) is small enough to fit in a cell, that's where it lives.

>> first "abcd"
== #a  ; an ISSUECHAR! that fits in a cell

Ren-C unified mechanics behind PATH! and TUPLE!, making both read-only. A BLANK! (similar to NONE! but represented as _) will be invisible when rendered in a path. This unified REFINEMENT! with PATH! as /a is just a path representation of the array [_ a].

These representations are compressed by means of every series type having a "flavor byte" (kind of like cells having a type byte in them). If a PATH! points to a series that's a "spelling" (e.g. the backing store for a WORD!) then it knows it's either /a or a/, and it only needs a bit to resolve which.

PATH! and TUPLE! compression, explained

I wouldn't say that I'm fully uninterested in such things. But my concerns regarding Redbol languages really revolve around their fitness for purpose.

Right now the great unknown remains binding. One recent advancement was the so-called Sea of Words... a firm decision that the mere incantation of a WORD! did not allocate a variable. This decision has brought more control to the problem, but many questions remain.

Rebol And Scopes: Well, Why Not?

2 Likes

The serialized format, series slices, and threads are probably the main additions.

The 'serialize function packs data into a binary image. Series positions, slices, and non-global word bindings are retained.

Slices save tons of memory when parsing, so parse was changed to mark the end of a slice when a get-word! (symmetrical with set-word!) is used. For example, the following will collect all XML tags without creating any new string! buffers:

tags: []
parse xml [some[
    thru '<' tag: to '>' :tag (append tags tag)
]]

This slice end assigment gets used in the majority of the parse statements I write.

Threads can be created and each has its own private data store. There is a static data store shared by all threads from which the built-in functions are accessed. Threads can be created with a port! that allows data to be transferred between the private stores. The Copr build tool makes use of threads when the jobs option is used.

Other than the shared environment data store I haven't really changed how binding works. Unbound words get added to the thread context (what you call the user context).

Float arrays are common in graphics, so Boron has a vector! type which
handles this, as well as a simple vec3! type for float triplets.

My main priority is keeping the system small, so the Boron startup footprint is about 15x smaller than Rebol 3 Alpha. See the start-quit results for actual numbers.

3 Likes

Slices have been suggested, but they seem to make the "hidden index" issue worse than it already is...and the semantics on mutable series feel pretty sketchy:

Working with Sub-Series

If you'd like to speak to those issues there, feel free. The main reason I'd like to see it answered would be to get rid of this /PART refinement that spreads all over the place.

Ren-C is now "stackless", which is important for things like being able to yield up the stack to do I/O in the browser. I break down how this affects writing something like a WHILE native in this post:

Stackless Is Here, Today, Now! 🥞 - #2 by hostilefork

Building on that with green threading--like Go--is the direction I think makes the most sense. So something along the lines of "channels" would be used to communicate.

Red has "Redbin", which I haven't really looked at. In the meantime, their "redbin specification" page seems to have vanished:

Red Programming Language: redbin

But it would seem clear to me that there are much more foundational problems to solve...vs. tailoring hibernation for a system with so many unsolved design issues. Generic hibernation lets it be someone else's problem:

CRIU
Application checkpointing - Wikipedia

I haven't needed to persist a Ren-C session...but I'd use something like that if I wanted to.

Just about every time I've tackled some weird optimization it has turned out to be too soon for that. I want the design to be something that can truly be composed in interesting ways--and using the interpreter in deep and heavy ways is good testing for that composability.

So I'm more focused on the spirit of dependency control, vs. worrying much about cycle counts or trying to minimize code running at boot. The web build notably minimizes its included extensions to JavaScript interop and the Console...that's it.

That said, it's been a really long time since I've bothered to pick apart performance or memory use (getting a fast new laptop will do that to you). I need to take some time to do it. But when I do, I wouldn't be doing it at the level of "oh, this mezzanine thing should be written as a one-off native in C that makes the boot faster"...I'd only want to be optimizing generic reusable parts.

3 Likes

A Boron slice can get into weird states, but the basic implementation is simple and the performance gains are too great to pass up. If the typical use cases are easy and efficient then I can accept having weird corner cases.

Cooperative threading and OS threads are different beasts, so I don't see it as an either/or choice. When you need to use multiple CPUs you need to use multiple CPUs.

The serialized data is just data. When I used the word "image" I did not mean a snapshot of any evaluation state. But as context bindings and slices are preserved this is quite different than a simple binary version of 'mold.

Call me crazy, but I care about minimizing computing resources and am always on the lookout for optimizations. The greatest gains are made in the design, so really performance needs to be considered first and not be left as a mop-up operation after the damage is done.

1 Like

If it's something you make for yourself, to please your mood and senses, there are no wrong answers.

But I think that when new people come to a language they want to know "what can it do?" And if a Redbol language falls down on simple compositions, I don't know it's a good comeback to argue that's okay because it's "small" and "fast"...when many smaller and/or faster languages abound.

To me the allure would be things like "you can make your own looping constructs". And so that means needing an answer to that kind of problem:

https://github.com/metaeducation/ren-c/blob/master/tests/loops/examples/for-both.loops.test.reb

(Beyond what's demonstrated in that file, things like RETURN work e.g. having a per-function definition, so that they have the meaning intended vs. "RETURNing from a FOR-EACH" being in the body.)

And of course, being able to show that everything can come together to build powerful things in usermode is what I think tells a compelling story:

Introducing The Hackable Usermode PARSE ("UPARSE")
https://github.com/metaeducation/ren-c/blob/master/src/mezz/uparse.r

If you think the point is something else, that's fine...though unfortunate, as it would be nice if your optimization time could be spent helping on things like UPARSE.

2 Likes

Boron was created primarily to be able to embed (and extend) the datatype system in other programs. It just happens to have a default evaluator that can be used for scripting (exaggerating a bit here).

When using it as a scripting language I have no interest in mucking around with looping constructs. It has (or should have) a good selection ready to go.

As it happens, Ren-C started out with an option to not build in an interpreter.

But that wound up requiring functionality that duplicated PICK. e.g. an API that did what pick did...just bypassing the evaluator. Or what APPEND did, just bypassing the evaluator.

I pretty quickly panned that as a direction that had much likelihood of success (although it is what libRed did a year or two later). So the API was restructured around variadic calls into the evaluator:

Limiting API Entry Points in Favor of Exchanging Strings

It's been incredibly versatile in the tasks it has been used to attack. Stacklessness makes it moreso...it means you can have a thread of execution on a host that wants to do an enumeration, and it can be calling a generator that is based on Ren-C code which yields to the host language's loop control.

I guess it seems like you're saying Boron has also not taken so much a libRed angle, just paring down how many primitives those are (?). e.g. your API currency is still cells, not series pointers (Nope, there's UBuffer which is basically a REBSER, see post lower down)...but you don't have a lot of entry points for cell operations:

UStatus  boron_load( UThread*, const char* file, UCell* res );
const UCell*
     boron_eval1(UThread*, const UCell* it, const UCell* end, UCell* res);
UStatus  boron_doBlock( UThread* ut, const UCell* blkC, UCell* res );
UCell*   boron_reduceBlock( UThread* ut, const UCell* blkC, UCell* res );
UCell*   boron_evalUtf8( UThread*, const char* script, int len );

Ren-C might be said to be "more extreme" in the sense that there's no separate entry point for REDUCE (or DO), you call it like anything else through evaluation. (though it has a lot more entry points for extraction, e.g. unboxing of integers is folded in with a variadic evaluation, so you can do your calculation and the extraction all without giving an API handle to the client)

All right, I guess I now know the scope and limits of Boron. Which is what I asked, so thanks.

3 Likes

One of the use cases of Boron is simply as an alternative configuration system to XML or JSON. No evaluation is required. Even with the interpreter, the library is smaller than libxml2 (which in xu4 uses 2.5 MB more of heap memory).

To recap the history, first there was the Rebol clone Orca, then Thune with a forth style evaluator, and then Boron. It was with Thune that I actually first used it for application configuration where I was working.

The common theme is that the datatype system is the centerpiece and the evaluators could be swapped out. This is what I had wanted Rebol to be so that an evolutionary approach to developing evaluators and DSL interpreters could be taken.

2 Likes

Loooking at your uses in the Ultima IV emulation, it turns out you do seem to expect clients to use the ur_XXX "Urlan" functions, which speak in terms of UBuffer, which is like an R3-Alpha "REBSER" I guess.

So it's not cell-based, it's UCell and UBuffer based. Clients are expected to worry about internal cell bits GC'ing underneath them, so they have to ur_hold(), etc.

Hence people basically wind up with the concerns of programming natives to use the structure.

I don't think that's going to win a lot of hearts. :confused:

(But having had to go through a lot of really bad and buggy R3-Alpha extension code...where there were lots of guesses and casts of the internal structure of contexts and series handles on the outside...and which didn't even have an equivalent to ur_hold() despite very much needing it, I know it could be much worse.)

Cutting out the evaluator means cutting out the expressive language for composition. Without stuff like COMPOSE I don't think the case is as strong to use the format in one's program.

The ur_XXX() APIs are a large set, and are micro-fiddly to where they're difficult to use correctly. You've got dozens and dozens of entry points you invented which appear arbitrary, and must be combined just-so.

You use it yourself and haven't really been saying you expect anyone else to... so that's fine. But I'll suggest it won't resonate with any "Power & Control & Size" audience out there. I think they'd see the library as another middleman and that the ur_XXX() APIs are a bunch of cruft. They'll use std::vector<> or something and complete their task another way, at an even lower byte count and fewer CPU cycles (and with type safety--or some other tradeoff that power user cares about).

I think there's more promise in the variadic evaluative API, e.g. see the uses in the ODBC module:

https://github.com/metaeducation/ren-c/blob/02d1ba2c6e2a8b5fc689d4d6684435ae369a528d/extensions/odbc/mod-odbc.c#L539

I'd rather invest in seeing what can be done to make a system that can do that small enough vs. publishing and supporting a parallel API like ur_XXX().

(You've got me sort of thinking about this domain now...so I'm reasoning through options for minimized builds tailored to this sort of task.)

I start from asking what people can do within the language--without resorting to C--and looking to see what the pain points are.

  • If you ask what the third item is in a block and you get back NONE, then was that "no item" or "yes there's an item and there's a none"?

  • I don't wish this kind of problem away by saying "oh, that's too low level a question, if you care about that you should use C".

  • By not requiring you to cross the barrier to C to write high-functioning code, you can explore and prototype more easily.

  • If an exploration turns out to be popular, you can write it (or parts of it, progressively) as native code.

2 Likes

The data store & garbage collector were created by me without ever having seen another implementation, so if it looks odd that may explain it. Once the design was set I don't think it really changed much over the years.

The scripting language was modeled on Rebol 2, so it is what it is.

As it was used in existing C/C++ systems to provide configuration, persistent data storage, and/or scripting capabilities, it was natural to extend functionality from the C side of the fence.

If Rebol had been open source from the beginning then its likely that none of the code you're looking at would exist. I don't recall when that happened, but my Thune repository starts at the beginning of 2006 and Rebol 3 starts at the end of 2012.

2 Likes

Well, understandable. And the parts that are there don't seem worse than their corresponding implementations in R3-Alpha--better on several axes I'm sure (it pretty much has to be more coherent regarding threading).

I'm just rather certain that such a low-level API won't catch on for the JSON/XML competitor functionality.

The higher level methods like what I'm demonstrating might, because they are much more ergonomic, and have the chance to bring in some special magic via evaluation that people wouldn't get elsewhere. And in ways they are still rather close to the metal (va_list on stack, END signal put in by a variadic macro).

But it's still a pretty long shot to imagine people using it for config files. Or anything "serious"--I don't know. I still am mainly targeting programming-as-game competitions (code golf, etc.) and see what applications someone might bend it to if they get enthusiastic about it.

2 Likes