Strict Equality, Lax Equality, Equivalence, Sameness: IS and =

hostilefork · October 23, 2017, 3:32am

UPDATE Mar-2019: This proposal has been shifted to a more benign variation, where IS and ISN'T replace == and !==.

Rebol has historically had a lot of trouble with questions of equality, and various hierarchies of it. The idea of EQUIV? even came up, which was to check to see if two blocks of code weren't just STRICT-EQUAL? but if the words in them had the same bindings. :-/ (I killed off EQUIV? in Ren-C pretty early because I couldn't think of any reasonable application of the knowledge you'd be getting out of it.)

The fact that Rebol equality is lax by default has bothered a lot of people. One of the worst and most memorable cases I've run into wound up leading to a day (or more) of debugging some code while trying to port Red to R3-Alpha. A tag and a string were considered equal. It's still the case today...Ren-C hasn't yet changed R3-Alpha's behavior on this, nor has Red:

>> <fOo> = "FoO"
== true

But none of it really makes sense, check out this situation also the case in R3-Alpha and Red:

>> <abc> = %ABC
== true

>> (quote def:) = 'def
== true

>> "ghi" = <GhI>
== true

>> [<abc> def: "ghi"] = [%ABC 'def <GhI>] 
== false

Bleah. This is all pretty clearly bad mojo, and needs to get sorted out.

My opinion is that it doesn't seem you should have to say == to get a level of rigor that makes sense to a programmer. I don't think different datatypes should be candidates for equality of any kind, you should have to ask something like same-spelling? if this is your intent. Also, historically there has been a push and pull over whether Rebol should be case-sensitive or case-insensitive by default. I think @earl advocated (or at least considered advocating) for case-sensitivity in equal? and =, with some other operation like approximately-equal? being ~= or similar.

My "radical proposal" was to say that when you're using constructs like CASE or SWITCH that indeed, by default, they would operate case-insensitively... but that the symbolic nature of the = sign would be equal? and really mean equal. Then, rather than go with something "ugly" like ~=, we would pick another English word for an approximate equality. I proposed the infix word is, with a prefix form of is? Then negate this with isn't... apostrophe and all.

The idea is about a year or so old now, and I've become more convinced with time. The more I think of the aesthetic properties of code, the more I do not like ==. It looks like a header or barrier more than an operator, and even with C experience I kind of cringe at seeing it used for equality in Rebol code. There are better applications:

==: function [:label [string!] :terminal [word!]] [
    unless '== is terminal [
        fail ["== expects string followed by ==, not" terminal]
    ]
    if verbosity = 2 [
        print ["==" label "=="]
    ]
]

== {Section Two} ==

Besides "Rebol shouldn't look like C", there's another reason to hate it. In the C language == and != are complements, but given Rebol having distinct equality meanings for = and == it means that the real complement to == is !==. That's just going to confuse people.

My suggestion for IS was to have it be "friendly". So "a" is first "ABC" would be true, allowing an equivalence between single character strings and characters. Approximations would be in effect for 1.0 is 1. It seemed to me that this non-rigorous behavior would gel well with touchy-feely people who liked the English expression, while the more rigorous behavior would be favored by those who liked the mathematical notion--who believed = should mean equal? should mean equality.

(Note: Red currently uses IS in its reactive programming model for field is reaction, "Defines a reactive relation whose result is assigned to a word." It is the infix form of IS~. I have proposed BE as a possible substitute for that meaning, should the reactive model be carried over.)

It may sound like a difficult change to make, but it's more an inconvenience than it is truly difficult. As with most Ren-C conversions, the way to go about this would be to have a period of deprecation of = and !=, where you use is and isn't and continue using ==. Then when enough time has passed and all the old = and != are believed to be gone, you bring them back under the "strict equality" meaning, replacing the lingering ==s.

The bigger details are just kind of wondering about "the meanings of things". For instance, today you can have a block like block: [foo 10 bar 20 baz 30] and say block/bar and get 20 back. One wonders how much that hinges on the equality of words and SET-WORD!. Since Ren-C has evolved such that "PICKing" and "POKEing" are driven by the same code that does paths (hence "path picking" and not "path selection"), we might say that only SET-WORD!s are considered candidates for picked words. So block: [foo: bar bar: 20 baz: 30] would pick foo/bar as 20 and not bar:. If you wanted bar: you'd have to use SELECT, and select would honor the datatype given.

What are people's thoughts?

hostilefork · October 23, 2017, 5:12am

Sidenote: I've been more accepting of the idea of maybe <> being all right for "not equal" as opposed to !=, and using <{}> for "empty tag". However, one should note under this proposal it would be strict/case-sensitive inequality...while isn't would be the lax inequality.

(One should also note that tag <> <foo> is a bit harder to read than tag != <foo>, so != might be kept even if ! is not "negation" in the language as a whole.)

I'll also bring up the question of what to call the switch on things like SWITCH or SELECT that demands strict semantics. I dislike /CASE... especially if it's used to discern 1 from 1.0 or 10% from 0.01. We'd been considering /STRICT. But with this, it should probably be either /EQUAL or /=. Thoughts?

rgchris · October 23, 2017, 6:07pm

Given the [word "Value"] pattern is very common, I'd be loathe to force set-words in order to make that accessible. Aside from anything else, there'd be a dramatic rise in usage of QUOTE. My beef with (foo: [bar "Baz"] | foo/bar) is that I feel it should be restricted to odd-numbered words (I have Red ticket #2850 open on this—had intended to open a parallel Ren-C ticket).
I don't have issues with stricter equality. Some synonyms for loose behaviour: ALIKE? SIMILAR?
For /STRICT, might suggest SWITCH/PRECISE.
As you mention the loose behaviour of SWITCH, it occurs to me that it's strange that DATATYPE! isn't considered equivalent to WORD! therein (possibly loose equality with WORD! in general too). Would obviate the need for special handling of GET-WORD! in current Ren-C SWITCH and allow GET-WORD! matching in SWITCH/STRICT. Of course, if it had been applied historically, could have avoided the whole TYPE?/WORD thing in the first place.
I bristle at the idea of tags not being literal, I think I'd prefer no empty-tag literal than <{}> syntax.
Just a thought, could you modify the behaviour of = with a prefix, e.g. (approx 1 = 1.0) (could perhaps get messy).

hostilefork · October 24, 2017, 12:37am

It's my impression that most people favor it. I guess the core question would be whether you think IS and ISN'T are solid choices as opposed to ~= and ~!= or ~<>. I've gotten comfortable with the idea over time...even the internal apostrophe (I'm thinking that contractions are pretty clearly WORD!s, even if they are a little unsettling, and it's part of taking Rebol where other languages don't dare'st t'go.).

If I wasn't clear above, <FoO> is "FoO" would be false...and #foo is 'foo would be false, despite some type-tolerance for 1 is 1.0 and "a" is first "abc". The notion of IS-ness...its idea of equivalence...would still apply to all operations by default unless overridden. This makes it not as drastic a change as it would be otherwise, if strict equality were thrown into SWITCH such that switch 'FOO [foo [print "this wouldn't match"]].

We could also consider block/'foo to mean a WORD!-search. It doesn't generalize since we don't have LIT forms of everything. (Even if we did, block/'foo: would be ambiguous). I did mention that we could try for block/['word] or block/[word:] or block/[<first> #second] as a way of searching for a general value or sequence of values.

It's just my feeling that the randomness of "I don't know what type it is" probably doesn't have a lot of good usages. But maybe the WORD! case is common enough that the LIT-WORD! shorthand is worth it for that.

(Note: Since pathing has no refinements, any searching it does would be based on IS-ness.)

The odd-numbering doesn't strike me as particularly useful in the dialect-driven world of Rebol. I kind of imagine formats where I might throw in comments or strings or have a dialect that augments some assignments:

stuff: [
    {Maybe strings are commentary}
    cool
    x: 10
    y: 20
    /cool
]

It seems that being able to query SET-WORD!s is more generically useful. Maybe even to the point of never returning a SET-WORD!, but to keep skipping. So block: [foo: baz: 10] where block/foo and block/baz would both come back with 10.

Tags need a way to put spaces in them, and other illegals. <{ this seems reasonable to me }>, and basically every other alternative is uglier. I think tags need some restrictions so we can keep > and < and <= and <<= and --> and <!-- and all of their friends in the "operator-word-space". Though we do want to permit things like <123> even if they are illegal HTML.

We already have parallels in %"filename with spaces" which I think should also support %{filename with spaces} syntax. So I'm guessing %{} would be empty file instead of just %.

We're looking for interesting sweet spots, the "inspired by theory, driven by practice" thing. So looking at examples that are truly terrible is worthwhile. I think you made a good case for why # is bad for comments with a concrete example, so good to keep with that.

I proposed something along these lines a long time ago with taking ~ away to mean "approximate". So ~{aBc} = {abc} could be true even if {aBc} = {abc} is false. It would be a bit that lived on the value.

You'd have to write this as 1.0 = approx 1 or (approx 1) = 1.0...or you'd have to make APPROX take its first argument as #tight so that it wouldn't let the = run until it was done. But that would mean approx 1 + 2 would be seen as (approx 1) + 2.

Regardless of what the refinement is named, I want to have them specialized as switch=, (with the complementary specialization switch=*... use strict equality and void branches mean void.

Update Oct-2018: There are no longer forms of conditional constructs that will literally return null (the non-value-formerly-known-as-void). NULL always means no branch ran. So there are no if* or case* variations, there also are not ?-variations either, since testing for null is easy enough. That makes having SWITCH= a single specialization, if it were implemented.

So what I like about SWITCH/EQUAL is that it would tie together people's knowledge of how the operator worked so they understand that's what is driving the equality tests inside the operation. And it translates well to the shortcut. So /EQUAL is my vote at present, but if people don't like that it won't matter that much to me as I'd use the specialization.

Really a lot of this just ties into trying to reduce the total number of forms of equality people have to worry about...not using one form for searching blocks, another for comparison, another for switch. Getting down to just three forms is probably about the most we can handle (with SAME? being the wild card not yet discussed)

rgchris · October 24, 2017, 2:54am

My inclination is that path foo/bar notification is near exclusively associated with the notion of key/value-paired data. It's used with objects, ports, maps and predominantly key/value blocks. I'd suggest that having the following example work as expected is of greater value than hypothetical irregular dialects:

data: [foo bar bar baz]
data/bar

This is a longstanding frustration and source of confusion (and bugs).

I feel we're unpaving the cowpaths here. We're making it harder (and uglier) to do the most elementary and common things in the hopes that the set-word behaviour is more useful. I don't think paths should get more complex—that complexity should be passed to functions like SELECT and FIND.

-1 on naming. Our scripts will appear as if they're acquiring syntactic tics.

foo/'baz: switch=* type-of foo/[bar:] [:string! :url! [eek!] (_)]

I'd love to see a SWITCH that works like this:

switch type-of "Foo" [string! [...do this...]]

What are the downsides of equating DATATYPE! with WORD! here?

hostilefork · October 24, 2017, 5:24pm

I had some remarks in chat:

Carl did argue that Key: Value was Rebol's parallel to <Key>Value</Key>

It all boils down to invariance. Who's writing this code that is supposed to be able to remain unchanged when you morph scores: [Bill: 7 Ted: 8] into scores: [Bill 7 Ted 8] or scores: [#Bill 7 'Ted 8]?

If these invariances are nonsense (I believe they are) then I don't see why having to use a construct that says what kind of thing you're looking for, scores/'bill vs scores/bill vs. scores/[#bill] is so bad.

I'm just not liking this fourth equality operator, the "one used by pick". If we are going to say (quote foo:) is (quote foo) is true, then [foo:] is [foo] needs to be true, and I don't like that. Where does it stop?

I'd like us to kill off "pickquality" because its existence raises a lot of technical problems. But if we keep it, it needs to be reigned in to specific use cases.

Our scripts will appear as if they're acquiring syntactic tics.

"Expert" code in the guts needs to be easy to write, and not have a lot of volume taken up by stuff that isn't what's "really going on". The needs are different.

If you want to write SWITCH/EQUAL/ONLY you may, but that's a lot of typing which is incidental to the formal intent. That much verbage hides the program one is trying to write. Collapsing into the "tics" is a way of acknowledging you're in "expert mode" without saturating the space.

My own views on this have been--as they say--"evolving". In a C function which takes a context parameter, e.g. Append_To_Context(REBCTX *context), if every line of the function says context->... or Prepare_Context(context) you get a saturation where you can't read the code anymore. If you change this to *Append_To_Context(REBCTX c) then suddenly the code becomes much easier to absorb, because you can see the operations better and don't have the word context written out on every line.

Rebol has an answer of sorts with do in context [...], which can drop a lot of redundancy. It's not used as often as it probably should be, and I don't know how much of people not using it involves them knowing how inefficient it is. But that's what I want to change.

hostilefork · October 26, 2017, 4:43am

One downside of equating words with types is that it introduces keywords. Think about group!: paren!: #[whatever].

The more general solution IMO is that SWITCH soft-quotes. So you can switch type-of x [:string! ... or switch type-of x [(string!) .... This also lets you switch on blocks and such. It doesn't introduce a questionable impurity.

But there are other downsides. Mechanically speaking, I really think we want to avoid IS-ness seeing a block of words as a block of typesets. Their semantics are very different.

rgchris · October 29, 2017, 10:43pm

Another suggestion: SWITCH/EXACT
Am still partial to the primacy of  over the potential benefits of further non-alpha words (I know you know this, am just restating : )
Whatever form loose equality might take, I do wonder of the relative perils of:
string! is/nearly first [string!]
where:
'foo is/nearly first [foo:]
The potential benefits for a loose-by-default SWITCH statement for this idiom seem useful where currently awkward conventions fill that space.

hostilefork · November 13, 2017, 9:45pm

As a random additional thought about equality, signs and operators...

I was thinking about how unfortunate it is that the pairing for XXX-or-equal operators is <= and >=. It goes with the saying out loud of "less than or equal to" and "greater than or equal to". But symbolically it takes something that looks like a left arrow out of the picture. It would be better to use =< and >=, reserving the <= and => for something else.

But sadly, that is not the way history has written things. Regardless of what people do in their own dialects, <= is not going to be some data-flow-direction-arrow in DO...it's going to be less than or equal to. It would be a lot of muscle-memory to retrain to switch, and a lot of frustrating bugs when I messed up and got some strange left-arrow-operator.

This makes me wonder if it makes sense to just go ahead and make =< a synonym for <=, and => a synonym for >=...and accept they've been taken away from other meanings. I have a hard time believing in => as an operator doing something clever when <= is just less-than-or-equal-to, so giving it up is okay. We can just make do with -> or ->> or *-> or >>=, etc. in DO.

hostilefork · November 29, 2017, 8:07pm

Carl's Rebol Blog on the topic of != isn't currently taking comments (or at least, not the one below, it said REBOL terminated), but here's what I tried to say there:

In the past, I spoke about <> as being too "tag-looking". If {} is empty string, why wouldn't <> be empty tag?

In time, I drifted away from that particular religion, and made peace with <> being a symbolic WORD! and not a TAG!. The specific tag escaping proposal I have in mind would allow you to put spaces in tags, e.g. <{ spaced out tag }>, and thus <{}> could be an empty tag if you really needed one.

For R3-Alpha, the problems with != go deep. Firstly, if you bow to C's influence in this way, then people will expect == and != to be a paired set. Yet that would not be so, as = and != would be a pair, and then == and !== would be a pair.

That's not going to please people. However, permitting != as a synonym for <> is less evil if equality is reimagined as laid out here.

At which point, there's no == to be inconsistent with, and I guess I'd throw my hat into the ring for "what harm does it do to have it". As @AntonRolls says, it could make comparisons to a tag look better.

foo <> <tag>
foo != <tag>

I'm not convinced I support ! being in the box for NOT, however. Yet people do it on their own. We've been experimenting with "ternary" non-evaluative operations as condition ?? value1 !! value2, an idea from Perl6... and the idea would be that this does not evaluate blocks. So x > 2 ?? [a b c] !! [d e f] would be equivalent to either x > 2 [[a b c]] [[d e f]].

hostilefork · January 30, 2018, 3:56am

I'm always putting on my newbie glasses, trying to have fresh eyes, and not see things with bias. While it's hard to do, it is somewhat possible.

I am really starting to have a problem with <= being "less than or equal to".

It looks like an arrow. It just does. And >= with =< are a much more balanced pair that do not look like arrows.

It's one of those "once it's been seen, you can't unsee it" things. Things just start looking better.

if x >= 10 [print "foo"]
if x <= 10 [print "bar"]

if x >= 10 [print "foo"]
if x =< 10 [print "bar"]

It's a better symmetry. And the <= sticks out like a sore thumb, making it look like information is flowing in some direction... like you're putting 10 into x.

Given how much time I spend programming C, I would probably screw up constantly if <= were given to some other operator, and depending on what it was given to that might cause easy or hard bugs to find. But it seems to me that maybe, what could be done here is that <= and => could be left undefined in the core...and then it would be the users who felt they'd retrained themselves (or had no prior biases they'd be needing to untrain) who could start using them as arrows for whatever purpose they imagined.

This would give people coming to the language without prior biases an advantage, and users who wished could still define <= as less-than-or-equal if they really insisted. They can define => as greater-than-or-equal too, but if they feel hesitant to do so then they may realize why the other definition is poor too.

What do people think? The softer option is to go with what I suggested previously...define both pairs and then let users override the arrows if they feel like it. But the thing about that is that it means people will likely fall into the status quo without having any moment of questioning it, and wind up stuck, whereas an early confrontation with <= being undefined would let them make an informed decision.

BrianOtto · January 30, 2018, 4:40am

At first, I was in agreement. I had never noticed the arrow until you pointed it out, and now I can't unsee it !!
I agree it is a nicer symmetry too... but I am on the fence about it. When you read the code in the current syntax, it says:

>= "greater than or equal to"
<= "less than or equal to"

When you read the proposed syntax, it says:

>= "greater than or equal to"
=< "equal to or less than"

That seems a bit, uhm, awkward. But I dunno, I'm not completely opposed to it. It just grates at me a tiny little bit... but maybe it's just years of doing it the other way.

If both are defined, then no one is going to go through the trouble of changing their ways to use =<. I just don't see that happening. Plus it is going to make code confusing when some people do it one way, and others do it another way. At least if they have to define it to work a different way then that is documented in their code.

My opinion is we either keep the status quo, or throw them into the fire... sometimes it's the only way to appreciate something new

hostilefork · January 30, 2018, 5:18am

Maybe you just need to go back a bit further in your memory archive. You might remember something along the lines of "the alligator always wants to eat the bigger number". =< puts the mouth in the right place. With <= it would be eating the equals sign...too full to eat the number.

Right. So I like it being the burden of the people who insist on it. Those people will have a whole lot of other old definitions they want loaded--there's no way this would be their only complaint.

It may be that scripts that have a Rebol [ ] header will be treated like scripts with a Red [ ] header, and not use any new conventions... grafted as much possible to the old ways. So these kinds of thoughts would be saved for a new language header. Or maybe it would still be Rebol, but Rebol [mode: Quality] Those are political questions, as to me I'm interested in that Quality with a capital Q...and that means thinking all these things through.

BrianOtto · January 30, 2018, 5:45am

Haha, I laughed out loud at both your responses. I don't have much else to say, other than ...

I never learned the alligator method. It was something along the lines of "someone is pointing at the smaller number (and laughing at them)" and so in this analogy the <= is the more accurate representation and >= is the one that needs to change. But looking back now, that is a messed up way to think about it, and this programming discussion now has me questioning my childhood

Anyway, I digress, I think you sold me on the alligator. It is too full. We need to help the alligator eat properly!

Brett · January 30, 2018, 12:57pm

<= ... "less than or equal to" the phrase for this symbol is hard wired into my brain. Can't remember when I learnt, probably with playing around on a Commodore Pet and CBM 8032 that I convinced the computer store guy to allow me to play with as a kid. Don't think I've seen different in any language I've used (though it's not a long list).

Math's degree more like ≤ but I still see the < first.

=< "Equal to or less than" - It's possible to learn it, but the ordering is inconsistent with the continuum you'd have in your mind that you're testing against.

=< Looks like a pouty face.. Once you've seen it you can't unsee it

Ok, so I admit maybe I'm biased. But having <= undefined it likely to be an unnecessary turn-off / hurdle for many people.

asampal · January 31, 2018, 3:27am

I never learned about the alligator and I do agree with @Brett about "less than or equal" being imprinted, but I don't think I'd have a problem recognizing what =< stood for. The way we were taught to remember the orientation on the greater-than/less-than 'arrows' was by associating the bigger (open) part of the arrow with the bigger thing and the smaller (point) with the lesser one. This orientation doesn't change with =<.

hostilefork · July 3, 2018, 3:05pm

On this issue, it's time to bite some bullets.

I don't want anything that makes it into the web tutorials to have to be "unlearned" for future versions. That means if we know already that something is going to change, it has to change before Beta/One. Or the feature is blocked off--and we do not teach it or talk about it yet.

Yet even in the most primitive demos, you can't "cut" tests for equality and inequality. :-/

While it's technically possible to have options to let people mix and match conventions, I'm really thinking about only two worlds. The Ren-C World and the Redbol World. (Stable Ren-C world is coming, but I made it clear I wasn't committing to it yet.)

People who want mixing and matching are going to have to do that manually, with a table of defines for the operators. The module system needs to be better about making this easy, but it will already be a pretty big job just to get it to handle the two worlds.

I should mention that the trend for other things I've "known were good but held off on, and then finally done" have worked out great. Moving to TEXT! has been great, switching to ACTION! has been great. Putting back the indexing to be like in Rebol2 and Red was way overdue too.

We've discussed a few things in this thread that are open issues. Namely:

What do we call the refinement asking for "=" testing vs. using "is" testing? Today that refinement is called /CASE. I'm not a super big fan of that since it doesn't just imply case-sensitivity... if select/case [1 a 1.0 b] 1.0 gives b while select [1 a 1.0] 1.0 gives a.
"a" is first "abc" = true seems nice. Though one might feel uneasy since that conflates with "a" is first ["a" <b> #c] . It seems to have more upsides than downsides (consider that historical Rebol claimed that <FoO> = %fOo was true, which has tremendous downsides with almost no upside). But we lack experience with it. I'd consider this to be a speculative feature which is the kind of thing that if it didn't prove itself might get deferred in Beta/One, e.g. by making "a" is #"a" an error for the interim.
How to deal with inequality? Does the more lax before and after seem right? "a" before "B" as true?

...and more, plus general mechanics of how to make the transition.

hostilefork · September 22, 2020, 6:30pm

With the design of "arrow words" framed up, I think it's time to lay this issue to rest, and just say these are all comparison operators in the default distribution.

>= and => mean the same thing: greater-or-equal?
<= and =< mean the same thing: lesser-or-equal?

What seems a little sad at the outset is that we are "wasting" =< and => on synonyms, and people from JavaScript won't be getting the familiar => as "lambda" by default. But -> is used in other languages for lambda, and it's light and clean looking. So it's probably preferable. And now there are a bunch of other arrow operators opened up.

In your own dialect or modules, you can make => and <= mean anything you want...redefine one or both. And there's no rule that >= or =< mean anything in particular either.

Case closed.