I love arbitrary escaping, but backslash may not be The One

UPDATE: While I tried to investigate alternatives to apostrophe for escaping, eventually I decided there just aren't any better options. So apostrophe is used in arbitrary escaping via QUOTED!, which has been working out very well...and is truly appearing to have been a missing critical ability.

We now have a generalized variation of LIT-WORD! and LIT-PATH! you can use on any type, to any depth. Evaluation just picks one level of escaping off when it sees an escaped value, but otherwise leaves the value as-is.

>> \(1 + 2)
== (1 + 2)

>> \\\a
== \\a

More examples

The feature is a paradigm shift, it changes many things in the system for the better...more efficient, more general, more expressive.

But it was implemented with the backslash character for starters. And I'm not feeling the love for the backslash so much.

  • It looks bad with paths. \a/b/c
  • It looks bad with bars. \|
  • It looks ridiculous with single slash \/
  • It caused a crisis of faith between me and // comments, because everything started looking "too slashy".
  • Every time I try to type something on StackOverflow about it, the formatting messes up, even when I put it in code backticks. Even with two minutes to edit the post, sometimes I can't manage to fix it in time.
  • While it is unshifted on my keyboard, it annoys my pinky enough to reach it that it's almost as irritating to type as a shifted character.

It would seem we don't have a lot of characters to choose from in Rebol's saturated space. And I've already said my piece about apostrophes.

My gut says we need another character. Which one? :-/

Bear with me here, but...

What about stealing % from FILE! (and killing FILE!?)

Unlike the syntax of URL!, there is nothing universal about % being a character for representing files. It's a Rebolism. Filenames and paths are just strings and can look like anything.

Whenever I get in a mood to talk about "NewPath", I stress that Rebol should use PATH! as the structural form of files. You should be able to put GROUP!s in them that evaluate...you should put BLOCK!s in them that UNSPACED, you should be able to do a for-each on them and visit each segment without using some other path unstructuring tool.

"but paths execute", people would say. Well, then there came the idea that you could put a FILE! at the head, so %foo/baz/bar.txt would be a three-element path, the first element being a FILE!. And you'd somehow have a rule that if an inert item were at the head of a path it wouldn't run. (of course, now bringing in a difference in behavior of (%foo)/baz/bar.txt. How could you put a GROUP! at the head of a FILE!, showing a place you wanted evaluation, but not evaluating immediately?

These concerns seem to melt away if FILE! is replaced with quoted words and paths.

>> file: %bar.txt
>> type of file
== word!

>> file: %foo/baz/bar.txt
>> type of file
== path!

>> subdir: %baz
>> file: %foo/(subdir)/bar.txt
>> reduce file
== foo/baz/bar.txt

>> root: %foo
>> base: %bar
>> extension: %.txt
>> to-local-file %(root)/(subdir)/[base "." extension]
== "C:\Projects\foo\baz\bar.txt"

It solves mysteries like "should files show the % when they FORM". No, because there's no such thing as a FILE!. If you want that it's a file encoded in the value intrinsically, you can always do it the formal way, via a file:// URL.

It solves things like "how would you get a structured file with a GROUP! at the head". Again, you aren't having some kind of FILE-GROUP! at the head, you're escaping the ordinary path as a whole...it's effectively a more pleasant-looking LIT-PATH!, which evaluates into a PATH!.

It's a pretty decent escape character

It's a little bulky, but it's even-keeled. You feel like you haven't been taken out of what you're reading and into the twilight zone, the way ^ and \ might look.

if word = %isn't [...]

It's wider than backslash and so looks better when stuck up against an already vertical-ish delimiter.

Compare %[...], %(...), %| and \[...], \(...), \| in both proportional and fixed-width fonts.

append block %|
append block %[...]
append block %(...)

It would mean that % evaluates to NULL, so you could use it instead of the NULL function that returns a null. For better or worse, you could thus unset a variable very quickly--without a word lookup or function call--with:

foo: %

Multiple escapes look a bit busy, e.g. %%%foo. A forest of 9 little glyphs to do what \\\foo does with three strokes and ^^^foo does with three hats.

But I don't think there's going to be a ton of instances of multiple escaping...clearly Rebol has gotten by so far with just one on words. Being able to do it with other types doesn't suggest to me a sudden viral outbreak of huge amounts of escaping.

And I actually don't hate the weird little forest of glyphs. It's at least strongly distinct from // foo and other places that would be much more common.

In this world, you could also get a "filename" via path in code via LIT, if you didn't want to trip up the escaping in your shell with issues regarding %...

 r3 --do "file: lit foo/baz/bar.txt | data :read file..."

Mechanical Concerns

It sounds pretty good, though the devil's in the details...but it looks worth pursuing.

The only obvious interaction with existing notations is that escaped percents would look like %10%. Which, well, actually that could be a useful notation in some dialect. Who knows. :man_shrugging: At least it's not \/.

One place you get in trouble is if your filenames contain words or patterns that are not legal escaping in Rebol, and you tried to use them as strings with the slashes inside of them. Previously you could say:

file: %"/c/Program Files/..."

And you'd be able to have it do some transformation or another to give it backslashes. But here you are using a completely fabricated path format in the first place. Why not:

file: %/c/"Program Files"/...

TEXT! and WORD! could be interchangeable. But really, this gets down to why you weren't just using a TEXT! and considering it a local file in the first place, and using a Windows formatted filename. If you could trust reading and writing to interpret it that way, and only do "backslash magic" on PATH!, it seems it would make more sense.

Another problem is how to deal with paths relative to root. Right now, /a/b/c has a refinement /a at the head of a 3 element path. That's a bit odd, and it feels like it might be better if it had a blank first element, and if you could have a blank last element.

Where you used to carry notation with you to indicate "fileness" with something like %1, if you just say file: %1 the evaluator will strip off the quoting level and file will be an INTEGER!. So it's another type that FILE-TO-LOCAL would have to tolerate, or require you to say %"1" to bring it into the textual domain.

So there's no shortage of questions that would have to be looked at. But it is rare in this that we come across something that's obvious how to work out all the details at first. You have to try and see, and see if when you start looking at it under a new light if solutions jump out or not.

That day when you delete the PATH! dispatch code for FILE!...

Speaking as someone who knows the code, FILE! was a catastrophe. Its mission and definition unclear, its implementation spotty.

Here is my suspicion: That Rebol has overplayed its hand in terms of what it can bring to the table with a FILE! literal datatype, and functions like READ should take URL!, TEXT!, and PORT!. If you have a more fluid way you like to work with your files in your source, you do so with the parts box of things like WORD! and PATH!, but before you get to the file I/O, you explicitly convert what you were working with into a TEXT! (or URL!) that says what you mean.

(I'll re-bring up how Carl used plain PATH!s and WORD!s, not FILE!s, in the R3-Alpha files list himself. That struck me as a rather crucial talking point when I saw it. And it made sense to me to want to do it that way--it looked worse with the percent signs AND you lost structural understanding. So I think the thing that doesn't make sense to me has kind of always been FILE!. It seems too hard to know when and how you'd want those parts converted automatically--and if you are given a solid parts box you shouldn't break too much of a sweat picking a point to textify things.)

So I'm strongly biased to seeing powerful new solutions that allow sweeping that code away. This feels like it might be up that alley... print [{...so we know why this doesn't print the percent} %foo]. Because it was removed from evaluation, and all PRINT saw was a WORD!.

If we look at real examples from a fresh point of view and ask "how can we make someone who's using the Rebol parts really happy" then having a unique string "flavor" that does nothing other than have a different type bit on it will possibly seem too weak to be worth it.

For similar reasons to URL!, I'd be loathe to lose the ability to lose FILE! as a thing that can represent a filename/path as exchange currency. While the percent convention is not part of any standard, it does somewhat mesh well with the percent-escaping convention consistent with urls and files as fragments of urls.

I don't much care for the use of words as shorthand for a filename (as in Carl's example), I think it has been a source of confusion and bugs.

1 Like

I really question the value, especially the value of going out on a limb for one's own standard, when file:// is it's own minefield, but at least one that's someone else's problem.

Rebol2 session on Windows:

>> cd /
== %/

>> ls
c/  d/

>> cd c
== %/c/

>> cd "Program Files"
== %/c/Program%20Files/

Are we really in the "actually good" territory of a problem space that can be reasonably tackled? If I'm a Windows user, I'd rather see that as {C:\Program Files} in my config files or scripts any day of the week.

What might we be losing by not hammering on techniques that are "actually good" and more cross-cutting, with features that could benefit across all dialects/etc.? What liabilities are we picking up by having code and special-casing, and the responsibility for handling how this "invention" might run up against user needs?

I was confused at first, wondering why he did it. ("Weren't those FILE!s?") But my theory is that what makes this confusing and buggy might be that we just haven't had a chance to push on it and embrace it to get the kinks out, and show value.

To me anything in this space has a reasonable burden of proof to warrant its existence. FILE! hasn't done that for me. NewPath hasn't gotten its fair shot yet, so I'm interested to see what could happen if it did.

1 Like

I think so—for the most part, you are looking to manipulate the last part of a file path and a lot of that is manipulating the stringy properties of a filename (e.g. uppercasing, changing extension, etc.). The rather less common handling of full paths can be handled by the TO-LOCAL-FILE, TO-REBOL-FILE, CLEAN-PATH functions where you want a local interface to filepaths.

Quite often FILE! is used to reference a relative filename (eg. Needs header, or, say—the desktop dialect) kind of a hyperlink property.

And also, FILE! is used to specify how pieces of data might relate to a filesystem:

Rebol [Title: "Unpacker"]
for-each [file content] [
    %this.txt "This"
    %that.txt "That"
    write file content