DATE! + TIME! + DATETIME! (or TIMESTAMP! ?)

hostilefork · December 2, 2021, 4:57pm

As we are familiar, DATE! can have a TIME! component:

>> d: now
== 21-Nov-2021/18:56:45-5:00 

>> type of d
== #[datatype! date!]

>> t: d.time
== 18:56:45

>> type of t
== #[datatype! time!]

Although TIME! can exist as a separate cell and value type, a DATE! doesn't store a time cell inside of it. It packs the date and time information into a single cell.

Hence when you say d.time above, a new TIME! value has to be synthesized. There's not a whole cell worth of time to hand a pointer back to...its woven into the bits of the DATE!.

That might not sound like much of an issue, but it creates the problem I refer to as "sub-cell addressing".

If you've missed everything I've griped about with this so far, it means that when you want to see a behavior like the following:

>> d.time.hour: 12
== 12

>> date
== 21-Nov-2021/12:56:45-5:00  ; we want hour updated

We run into the problem that if d.time synthesizes a value, then a naive picking process of (d.time).hour: 12 would only be able to manipulate the bits in the synthesized time. That wouldn't change d. What the user actually wanted was to update the bits of a time that was folded into the implementation of the date.

Rebol Lacks The Vocabulary To Do This In An Obvious Way

The smallest units that Rebol speaks in terms of are the cell and the node.

(If you need a refresher on these, my conference video tech talk explains them.)

It would appear we could be able to simplify matters if we changed the combination of DATE! and TIME! to point to a 2-cell node.

DATETIME! cell
[  ]                DATE!           TIME!
  --> points to [ 21-Nov-2021 | 18:56:45-5:00 ]  (2 cells)

(Whether the "zone" is part of a time or lives in the datetime would depend on whether you wanted to write d.zone: -5:00 or d.time.zone: -5:00, I don't know if it ever makes sense to speak of a time with a zone independent of a datetime or not.)

Breaking things up this way, we can say that d.time implicates a cell. And we can have some operation that acts on a cell (let's say POKE) like:

 >> poke 18:56:45 'hour 12
 == 12:56:45

Hang On: DATE!, TIME! (and DATETIME!) are IMMEDIATE!

We still have a bit of a problem here with our smallest units of representation. Presumably we don't want this:

 >> d1: 21-Nov-2021/18:56:45-5:00 

 >> d2: d1

 >> d1.time.hour: 12
 == 12

 >> d1
 == 21-Nov-2021/12:56:45-5:00

 >> d2
 == 21-Nov-2021/12:56:45-5:00  ; don't want d2 to change (right?)

But we also don't want to be needlessly copying the 2-cell node each time a date is assigned. So it would be a copy-on-write mechanic.

If we're working with a cell-based granularity, then we wind up in a somewhat similar situation to what we had before...where the tuple processing has to propagate backwards. e.g. when you have the POKE that changes the cell bits for the TIME! to make a new TIME! cell, there has to be some memory going back to the DATETIME! in order to tell it to make a new node and write the cell into the copy.

Does framing this in terms of cells offer any benefit over letting the DATETIME! be a higher-level entity that does a more specific folding of the TIME! cell into its bits? This is a question I've been trying to answer, and haven't had an easy time of answering.

One thing it would do to use a cell-based protocol is that it could generalize properties that had flags on cells, such as being PROTECT'ed. Without the picking protocol requiring each step to go through a cell, the system cannot fiddle these bits in a known way. So just as the DATE! folds the TIME! into it in some arbitrary way, the protect bit would have to go through this through a complex protocol also.

What I do know is that my current generalized solution is rather complex and slow--and doesn't answer how to do things like PROTECT. We're seeing a slowdown from many different angles and I am trying to figure out what the best tradeoff is in terms of simplicity and generality. It's not easy.

IngoHohmann · December 3, 2021, 11:37pm

It seems to me, that a micro optimisation like this doesn't buy much, but has a high price in overall complexity.

hostilefork · December 4, 2021, 5:39pm

I'm not so much worried about the micro optimization issue, as to whether bringing in limits truly simplifies things.

Of those two forms of currency--"cell" and "node"--the running of arbitrary code in the evaluator can move most any cell address. Imagine something like this:

>> block: [10:20]

>> block.1.(clear block, recycle, 'hour): 3

If the selection moves one step at a time, when you are at the block.1 point you might find you have a TIME! at address 0x5788ace0. But if the next thing you do is run the code in the GROUP! to resolve to hour then you've made that address invalid. This is C, and so you can wind up doing arbitrary damage if you don't account for such things.

(Note: The recycle is not necessary, as it can happen at any time. Ren-C debug builds would actually usually catch this without it, as they pay an additional cost on operations like CLEAR to intentionally corrupt cells that are no longer valid. This kind of bug is rampant in R3-Alpha and Red...and one assumes Rebol2 also)

So the reliable currency is actually a copy of a cell's contents (the bytes in the cell), and not the address of the cell. But without the address you can't make changes to an existing location.

Rebol is doing something most other languages simply don't do, by trying to open up field selection into something that's a kind of messaging...and if it can run any user code at all, that code can disrupt cell addresses.

The generalized way I have put this together would preserve the path itself as the invariant backbone for the messaging. Then it would run forward across that and then propagate any cells backwards to be written by the method that had picked the cell prior. But it didn't use addresses to do this: if you said something like obj.field.subfield: x then going forward it would ask for the cell represented by obj.field and then going backwards after .subfield: x was processed it would again lookup obj.field's address and ask to write it back.

What's frustrating is that there's really not all that much that avoiding subcell addressing buys us over this. :-/

I know how C++ operator overloading works quite well, but unfortunately just about nothing involved there applies here. We're lacking the parts. And I very much doubt that adding things like references and RValue references to the language is in service of the ultimate "vision" of what the system is supposed to be like. But maybe there does just have to be a new way of looking at the runtime to cover this kind of thing.

So I'm looking at what guiding principle can decide what to cut and what to allow.