Deepening the Lake: / as PATH!

hostilefork · August 31, 2018, 6:49am

This is the behavior in Rebol2 and Red:

>> path: 'foo/bar
== foo/bar

>> path: next path
== bar

>> path: next path
==

Visually you can't tell a length 1 path from a WORD!, which is bad. But then length 0 paths are invisible!, which is really bad.

R3-Alpha still doesn't visually distinguish length 1 path from WORD!, but empty paths give construction syntax...possibly bringing in the whole path content, even at the end:

>> make path! []
== #[path! []]

>> path: 'foo/bar   
== foo/bar

>> path: next path
== bar

>> path: next path
== #[path! [foo bar] 3]

In some ways, that is probably even worse.

I've always felt something is way off about / being a natural word. Even worse, the existence of other "words" like //. There seems no reason to encourage this line of thinking to have people considering / to be a word character at all.

I propose that SLASH ( / ) be a zero-length PATH!

Note: This post documents the original proposal, and is kept here. But see note in reply to post below about how this can actually be done without breaking division as we know it!

>> type of quote /
== path!

>> length of quote /
== 0

>> make path! 0
== /

Behavior-wise, I imagine the following as all being equivalent:

a/b/c: d
a / b / c: d
a/b / c: d
a / b/c: d

A length-0 path thus wants to behave path-like, but it is missing the things to path upon. So it would get them as operands from the left and right.

By similar logic, a length-1 path could look like foo/ and consider it to only need something on the right, giving more equivalent forms:

a/ b / c: d
a/ b/ c: d

Note: A REFINEMENT! is not a PATH!, so arguing that a /b should act as a/b is not necessarily as consistent as it sounds. I actually think refinements being inert is probably a good thing. So saying that only PATH! does pathing and acquires arguments if it needs them is the better way to say it... it's path consistency, not slash consistency.

Mechanically, I've gotten it to work!

Explaining what has all happened in the evaluator to make this possible would be a long story. It's rather interesting, and in pushing things around I've noticed that Ren-C can likely solve the SET-PATH! ordering issue #396.

Strangely enough, separating it out all the way is likely the more fundamentally efficient in terms of execution speed. If you compare a/b/c: d and a / b / c: d, the former has to set up a "subframe" in order to traverse a new array...while the latter can actually reuse the frame it's already in, since the data source never changes.

...but programmers are really attached to / as division

There are languages where having numberA / numberB doesn't perform division. None of those languages are particularly popular...

So this raises the question. Why not have numbers do "pathing" by dividing? Since pathing would need to be soft quoted on the left, it would be "tight" like today's division. It would act mostly the same, though you'd have to parenthesize expressions on the right if they weren't constants or variables:

>> 2 * 10 / x: 5
** Error: Numeric division via PATH! cannot be assigned

>> 2 * 10 / add 1 3
** Script Error: add is missing its value1 argument

It's something a programmer can understand if they realize slash is always just pathing, they basically did (2 * 10)/x: 5 and (2 * 10)/add 1 3.

You can work around it with 2 * 10 / (x: 5) and (2 * 10) / (add 1 3), which is backwards compatible with R3-Alpha. It doesn't seem like the end of the world, and you get some new interesting behaviors:

foo: 2/3 * bar

Though you'd still need to remember the left-to-right rules, as bar * 2/3 would evaluated as (bar * 2) / 3. Again, not hard to remember if you have been trained from the start to think of slash acting the same way with spaces or without... if you read that as being the same as foo: 2 / 3 * bar you realize that it's your formatting choice of whether to space it out or not; and if you mislead someone, it's your fault.

Before explaining the drawbacks: "Why rock the boat?"

Rebol is not really about giving people a scripting language that behaves exactly how other languages have taught them to expect.

What it seeks to do is to give a kit of reusable parts that are good and solid, which when executed in one light can give an expressive scripting language. Yet those parts are ultimately intended for multiple applications.

But if you look at the handling in older Rebol of length-1 and length-0 paths, today's PATH! is a crappy part. It doesn't have the solidity of BLOCK! or GROUP!. It feels junky and unreliable. Making it feel like a quality part is important for reasons that have nothing to do with path picking or division.

That is why I said that DocKimbel didn't see the forest for the trees when it came to things like allowing GROUP!s at the head of a path. He didn't consider this behavior to be a bug:

>> y: load "a/(b c)"
== a/(b c)

>> x: load "(a b)/c"
== [(a b) /c]

In his view, groups at the head of the path were "inconvenient for the evaluator". So if you wanted the latter to do "pathing" you should say:

 temp: (a b)
 temp/c

Yet when trying to build something that was CMake-like, there were good reasons to want to be able to put expressions at the heads of paths of things that were to represent file paths. It wasn't going to run with DO...these were file lists, processed by the dialect.

So the goal is to make PATH! more solid and pleasing, while not losing too much when the evaluator is applying its voodoo.

Quick reminder: PATH! is already voodoo

>> foo: [a 10 1 20]

>> foo/a
== 10

>> select foo a
== 10

>> foo/1
== a

Is pathing selection, or array indexing? "It's like, whatever you want it to be, man." If you'd wanted to feed 1 in and get 20 out, you would need to use a SELECT.

PATH!s now support things like a/[b], so I've wondered if forcing select semantics could be done with something like that.

>> foo: [a 10 1 20]

>> foo/[1]
== 20

And I feel like it should probably reduce the block you give it. But it's hard to say. The main point is that pathing is weird, and it has sort of "evolved" to meet usage needs while still sharing much of the code and behavior of BLOCK! and GROUP!.

So what breaks with the divide?

If SET-WORD! at the end of a chain of / indicates the behavior of a SET-PATH!, I've already shown that 2 * 10 / x: 5 would have to be 2 * 10 / (x: 5). Which is not too bad because you get an error and it's easy to work around, in the scheme of Rebol things.

But things like TIME! present a challenge to the duality of division and picking for /. Consider today's behavior:

>> t: now/time
== 23:11:01

>> hour: 2
== 2

>> t/hour
== 23

>> t / hour
== 11:35:30.5

You can divide a time by a variable called hour. And you can also pick the hour component out of a time. To a unified pathing strategy, these both just look like a WORD! coming in on the right...and it can't do both.

One might argue that hour of t looks pretty good, and so you could use that for reading the properties, saving / for division. But that doesn't let you change a variable's time field, e.g. t/hour: 4. The questions reach out to other things like DATE! even though they don't have division available, because one wouldn't think you'd access the month of a date with a different syntax to the hour of a time.

If time keeps today's behavior, there's an error message to guide you to use divide. You can only pick the words "hour", "minute", and "second" out of it. Anything else and t / foo will tell you that t has no foo field, which guides you to DIVIDE. If you want a quantity on the left you now have the option of SHOVE:

 begin: now
 loop 10 [print "timing..."]
 half-time: (now - begin) -> divide 2

Is it a dealbreaker?

I see letting / act like division ever as a compromise. It already was a compromise in the sense of making it a WORD!...you couldn't say /: or :/ or '/. Now you'll be able to say all those things... empty SET-PATH!, empty GET-PATH!, empty LIT-PATH!.

Basically, Rebol is already weird. If it gets slightly weirder while PATH!s get substantially more solid and flexible, that is probably a good trade.

I actually would say that with UTF-8, we should probably enfix divide as ÷ and encourage people to make that their keyboard binding for Ctrl-/. Then make Ctrl-8 be × and be done with it.

For people who find putting keyboard bindings in their editor too oppressive, we can enfix divide as dv. (div is generally integer division) Rebol could even come with a "mathify" utility which processes a source file that used DV during development, and turns dv => ÷.

My point being that math is great, but Rebol's PATH! story needs to be the main story. The parts have to be solid, not flaky.

hostilefork · August 31, 2018, 9:49am

Turns out we can have our PATH! and divide it too!

I've found a compromise that gives 100% legacy compatibility with how / worked when it was a tight enfix divide...while still letting it be a 0-length path.

And it's fairly obvious in retrospect: Tell the path dispatcher if there's a gap or not. Also, if there's a gap then use tight normal evaluation vs. quoting when providing the path picker.

What we lose from that is the ability to say a / b: c to act like the SET-PATH! assignment a/b: c. But that isn't anything people were asking to be able to do. Also, to the extent it would have been able to work, it would have been pretty limited... a / b: default c could not have acted like a/b: default c, because DEFAULT isn't part of path dispatch--it would left quote and see only the b:.

So a / b: c goes back to being a synonym for a / (b: c) as it has been in historical Rebol. Division as you know it is now safe... and PATH!s still get cleaned up. win-win.

giuliolunati · September 4, 2020, 11:58am

As pointed out in https://github.com/metaeducation/ren-c/issues/1071
/ having no binding make impossibile to customize its behavior (e.g. 1 / 2 => fraction custom type)
My proposal is: reserve / for division (making it a word) and use \ for paths.
... and before the people start screaming horrorified... I demand to think: why not?
\ is unused in rebol, and is used in windows filepaths.

The use of / in paths is not wrote in the stone...
Sure, a lot of code to change, but we are yet playing this game, we aren't?

Also it will help to disambiguate files from paths
and refinements:
/ => word
%a/b/c => file
%a/b\c => path with 2 components (the file %a/b, the word c)
a\b\c => path
f/r => function with refinement

hostilefork · September 4, 2020, 3:46pm

Backslash has drawbacks. One is that in markdown or string contexts it very frequently means escape, so you'll have a hard time conveying strings using it and wind up having to do \\ or similar. Another is that it's typically not as accessible on the keyboard.

(One thought in particular is that by not having backslash in use for anything in the language, it might be good for tools that want to insert escape sequences into Rebol code...if you wound up in a situation where you needed to do preprocessing. It's been kind of kept "reserved" with that thought in mind.)

In the past I think I've suggested the idea that "empty" path forms could have bindings and act as synonyms for some word. It could be a weird one like -slash- and then two slashes could be -slash-slash-. So when you said:

/: <whatever>

That would act exactly as if you had written:

-slash-: <whatever>

And same for if you said / anywhere else, it would just do whatever -slash- would lookup and do.

This would be a really weird case where the datatype presents as PATH! but the cell format is a WORD!. There is kind of precedent for that.

I haven't yet looked through to see how difficult this would be. It's risky because if you start writing your own enumeration code that does bind-like things, you wouldn't see it as a WORD! but as a PATH!. It can't be "both".

Why not just a word: The vision I had was to try to get things to a uniformity where you would say:

>> join '/ 'a/b
== /a/b

>> join 'c/d/ 'a/b
== c/d/a/b

The concept is to make PATH! feel like a reliable part, similar in "trustworthiness" to BLOCK!...where there's no such thing as a "1-element path", and it's uniformly merging BLANK!-sides of paths to non-blank path bits. So the algorithms could look at the above as:

>> join-path [_ _] [a b]
>> join-path [c d _] [a b]

If you make / a WORD!, then obviously these algorithms become less regular (they have to be watching for /, and // and such too). But also, you face the problem of having these strange WORD!s that uniquely cannot be put into PATH!s.

So that is where the motivation comes from, and I think it could be interesting...it just hasn't been really developed much.

hostilefork · September 4, 2020, 5:48pm

Let's imagine they were WORD!s, and these "slash-words" were just illegal in PATH!s:

>> word: 'b
>> make path! compose [a (word) c]
== a/b/c

>> word: '/
>> make path! compose [a (word) c]
** Error: Cannot put special / WORD!s in PATH!s

Illegal seems the best idea...I don't think it would be a good idea to sweep that under the rug and try to take it as some interpretation like a/c (make path! [a c]) or a//c (make path! [a _ c]). Because as in this example, the (word) could be coming from anywhere and not intended to be special.

Unfortunately, to get the desired lack of ambiguity you now can't say make path! [_ _] or make path! [_ _ _]. So now /a and / are fundamentally different types.

I'd thought that making PATH! be more uniform would make it easier to think of using it in dialects to actually mean file paths. If some directory component wasn't a legal word you could use a TEXT! instead of a WORD!.

 my-directory-using-dialect [
      / [...]
      /foo/bar/ [...]
      /baz/{4mumble}/ [...]
 ]

This would give natural enumeration of the path parts, and it would have a "clean decay" to the single slash. And this all makes PATH! a better part for dialecting in non-directory purposes.

The idea all runs into trouble when you can't say to path! '/ because that can't be made into a PATH!.