Non-Network related schemes and URLs

Questions have been raised following the implementation of a handful of schemes in the ReplPad package. A prominent aspect of this is whether a URL can be set to the current path and how each scheme handles this and path changes via relative file through CHANGE-DIR

There are eight schemes in all: FILE, DIR, HTTP, HTTPS, STORAGE, CLIPBOARD, LOG and DOWNLOADS. In implementing them, the first four are moot—you can set them via change-dir and the FILE/DIR schemes know to map relative files to those schemes.

In designing the latter four there are different considerations—none map to a hierarchical filesystem, thus I sought to avoid structuring them in the traditional thing:// form.

  • Storage — maps to key/value storage, so it uses a plain storage:key form:

    read storage:key
    write storage:another-key "Something"
    

    It could stand to distinguish between localStorage and sessionStorage—for another day.

  • Clipboard — doesn't have any prescribed form, just simply saying the following is enough and equivalent:

    write clipboard:// "Something"
    write clipboard::foobar "Something"
    ; read is not feasible
    

    As a side note, in a Mac implementation, I use the following forms to distinguish between different available text clipboards:

    clipboard::general
    clipboard::ruler
    clipboard::font
    clipboard::find
    
  • Log — for this scheme, I possibly could've gone for the double-colon again, but opted for a triplet. There are precisely four endpoints:

    log:type=log
    log:type=info
    log:type=warn
    log:type=error
    
  • Downloads — for this scheme, the argument is a variable single filename, I opted for downloads:/// so as to take advantage of DECODE-URL extracting that name for us:

    write downloads:///data.bin #{CAFEF00D}
    

    It will ignore any values between the second and last slash, though it doesn't check for conformance.

Each one of these is a subjective choice that I made.

Amongst the questions raised:

  • Is there significance in deviating from the traditional scheme:// format when you couldn't conceivably say: change-dir log:type= and then say: write %error "Foo"?

  • Is this the right design direction to go in at all when you could just write functions that say download or log/warn or write-clipboard?


My shorter opinions on this are 1) no—I'm not suggesting these forms are ideal, but there are prominent URL schemes that don't adhere to this form, for example: tel: and mailto:. I think that URLs should be constructed to fit their domain. 2) yes—URLs and Ports represent abstract external resources; schemes offer versatility on how the standard verbs adapt to their domain rather than introducing many domain-specific verb constructions.

1 Like

I think these are all ok.
I'm not a big fan of the colon notation storage:key bec I more naturally associate a colon with an assignment operator.
I would prefer a tuple approach like storage.key but that's a minor syntax quibble.

URL! is almost by definition (lexically speaking) SET-WORD! with no space immediately after. The double-colon form may be a bit clearer, it's really a case of what best represents an external resource.

Then it'd be a tuple! :slight_smile: Which is kind of why I'm asking.

I don't want these URLs to be arbitrary—they should have semantic meaning from a source-level view in the similar way that established (e.g. HTTP(S)) URLs do. They could, for example, be actionable from a code editor, or within ReplPad itself.

Ok. I was reacting to the way I visually read/scan that symbol. And if a symbol has the visual hallmarks of a SET-WORD! yet is technically a URL!, I think it's less intuitive/literate for the reader.

For legibility my vote would be to not deviate from the scheme:// format if reasonably possible. The scheme:// format is instantly recognizable and easily distinguished from other datatypes. A format like clipboard::general is inadvertently borrowing Java syntax for invoking methods, e.g., System.out::println.

I do think that people/developers get over small syntax differences though.

Got it. Another variation that I believe is legal URI syntax (but not currently Rebol) would be:

write log:(error)

Absolutely not borrowing from Java—I resent the insinuation :stuck_out_tongue_winking_eye:. Part of my reasoning for the double colon is that it's minimalist, you can add it to source and easily concatenate:

write join log:: "error"

Again, I want to avoid having it seem as if it's a protocol of that nature—that form carries with it certain connotations and expectations that I want to avoid. I'll cite the tel:, mailto:, data: and tag: URIs as examples that don't conform to the networking URL model. I think there's likely a balance here where we can conform to URI standards and create a model for external things that stands apart from the scheme://user@host:id/path pattern.

A great advantage to URLs is that they are universal. If you have a scheme that addresses local storage, you can share that scheme with JavaScript or any language that has a local storage mechanism of that type.

I bet most people do not regard these as standalone URI's in that sense, but kludgey attributes which are technically protocols.

I'm not a big fan of the scheme:// notation either due to the typing, and I often mutter about it when I need to write data to the clipboard.

I don't have any great suggestions, unfortunately. I would probably go with what you feel comfortable with and we can see how it feels as the foundation settles.

1 Like

They are well established and defined though: tel, mailto, data, tag. I'll throw in the WhatWG URL spec.

Oh, I know. I'm very familiar with them. They are somewhat anomalous, I think. Like functionality which was bolted onto HTML and later smoothed over. (A functionality gap within a DSL, I dare say.)

Not to belabour the point, making good choices now could be beneficial should you want to add nuance later. I always felt clipboard:// didn't really reflect the type of scheme it was. What then if you want to distinguish the destination clipboard, or you want to expand the functionality so you can say:

insert open clipboard::general make image! [1x1 #{CCCC99}]

Sometimes we have little to work with to inform decisions. You need to go with something that appears to fit immediate/knowable needs and to resign yourself to the fact that you may need to come back and revise it. A downside of a small user community/installed base is that you're often faced with that former situation, but the latter case means at least you'll face less pushback when make changes.

1 Like

I'm glad there's a post to put these ideas out and under scrutiny. I hope that full fleshed-out examples of schemes and verbs can be presented... in particular if there are any invariants at all that can be taken for granted.

I become very nervous when there are no invariants. It makes testing hard, it makes usage hard, and if one is having to actually implement things it can make that implementation impossible to know you've gotten right. That's an uneasy feeling

Saying "it's up to the scheme to decide" for everything is a free-for-all. If I can't be sure that append port-or-url [a b] is the same thing as append port-or-url [a] followed by append port-or-url [b]...because "everyone defines append however they want"...then it sets off my alarms for leading to chaos.

Explicit method names that say exactly what's happening are popular for a reason. (table.addColumn()). The drift I've seen in schemes is to try pigeon-holing behaviors onto a small set of words, with this adherence to the small word set providing no obvious polymorphism. The obfuscation strikes me as a net deficit...and if I can't make sense out of it, I don't see how the supposed target audience could. So I really want us to look critically at this before going "schemes are obviously great!"

Remember that nothing has stopped other languages from exploring this format with write("http://example.com/but-its-a-string", xxx). There's no sprinkling of magic dust on it by removing a couple of quote marks and parentheses. If this were such a world-changing idea...not only would others have done it, but given that they hadn't you'd stand to have a ton of influence by implementing it in their mediums instead of the much-less-popular medium of Rebol.


I've said I'm willing to believe that there is value to be mined here. But my impression is that the value comes from being able to say that certain operations follow a regular formula:

read http://example.com 

; ...is equivalent to...
 
port: open/settings http://example.com [num-reads: 1]
data: wait read-async* port
close port
data

If we can capture cross-cutting patterns like this, and show a new scheme written that seems to plug into it like magic...then I'd be on board. I've been pretty consistent in my messaging on asking to see schematics like this.

I've been thinking that piggy-backing whatever these "settings" are for ports on top of the accepted headers standardized in HTTP would be smart.

I don't think you can squeeze everything into a narrow band of 'this is what append does'. Append should mean a certain thing when it comes to Rebol values—strings, blocks, etc.—it has finite semantics, we can document that, write tests for it and be sure we know how it works*. You just can't make the same guarantees with ports because they are domain dependent. Should we come up with a bunch of port-specific or domain-dependent verbs just to maintain that consistency? I'd say that becomes unmanageable. If append is the right descriptor for what you are doing to a thing, then it's the right verb.

A loose standard for how you use these verbs work should exist, but it does already—in the relative function specs and in the English meanings of the verbs themselves.

log: open browser::info
write log "Stuff"
log: open remote://log/location
write log "Stuff"
log: open null://
write log "Stuff"

What does it matter that each of these schemes 'write' differently or have different outcomes? Why does it disprove the concept when write db-port "Stuff" returns an error? A certain amount of semantics can be gleaned from the type of scheme that you're using. To me there is a lot of lost expressiveness when you're spinning out WRITE-LOG, WRITE-HTTP, WRITE-FILE as separate functions.

Rebol [
    Title: "My Script"
    Options: [  ; map notation here!!
        log: my://logging/scheme
    ]
]

I don't know that there's a full accounting of how far this can go, what the possibilities or pitfalls are.

db: open some://dbms/scheme
select db [* from table-x]
change db [foo: "y" in table-y]

To me, that is coherent. Do I want to literally insert [foo: "y" in table-y] into the database? No—it's fairly clear from nature of the thing I'm changing that it has a certain effect. It even takes steps toward realising the applicability of Rebol dialects.

I'm not suggesting things are a panacea as-is—layering schemes/ports can get messy, callbacks and servers aren't trivial, events have a read/read, write/wrote problem—but they don't altogether suck.

Yeah, for sure—I agree, one cannot be closed-minded as to how others are doing things. In totality, the recognition of URLs (and files, for that matter) as first class values, the mapping of those URLs to active schemes (which can be implemented in different ways on different platforms to the same or similar effect), the ability to manipulate those derived ports with a standard set of verbs—it does present something different and I'd add compelling.


Incidentally, I'd cite the HTTP(S) example of a misleading use of schemes/ports. In trying to make it operate in parity with file schemes, it loses much of the nuances of the exchange. My Rest scheme is an attempt to remedy that, but is limited in its reliance on CALL/CURL and even then is based on a single REQUEST/RESPONSE transaction, it should be a dialogue.


* even then, append is comparatively lossy when used on a string vs. block, but we don't have append-string and append-block

1 Like

If you look at the HELP for R3-Alpha's SELECT it has:

REFINEMENTS:
    /part -- Limits the search to a given length or position
            length (number! series! pair!)
    /only -- Treats a series value as only a single value
    /case -- Characters are case-sensitive
    /any -- Enables the * and ? wildcards
    /with -- Allows custom wildcards
            wild -- Specifies alternates for * and ? (string!)
    /skip -- Treat the series as records of fixed size
            size (integer!)
    /last -- Backwards from end of series
    /reverse -- Backwards from the current position

How can a "generic" operation offer any refinements when semantics are arbitrary for each scheme? How can HELP tell you what it does?

As with the verb itself, any refinements can be utilized that apply.

This hasn't been implemented, but documentation could be stored within individual actors, in the scheme itself. I don't think you could document a dialect in a HELP string, but then HELP PARSE doesn't tell you how Parse works. I wouldn't see this as a prohibitive or insurmountable.

This puts a rather religious level of value on a fixed list of words and refinements that sprouted mostly from Carl's head. That somehow these words--and these refinements--are going to fit arbitrary variance in interpretation of meaning of the other arguments.

And somehow if you use refinements...and it just ignores them...the caller is going to be okay with that. Does the fixed list never change? If something gets added how do you catch the schemes that aren't retrofitted to handle it?

Consider the /LINES refinement on READ...try it on a URL in R3-Alpha.

documentation could be stored within individual actors, in the scheme itself.

Type checking becomes pretty much ruled out here, unless you have to parameterize your HELP request with the type of the first parameter to the action.

Otherwise how could you ever rule out any parameter type, if some scheme might take it?

Why would this particular API be fixed in time?

Are these concerns not addressable if that is the block? I mean, what would do you expect would be done that's not already permissible? Can schemes not reject values/refinements that are not compatible?

I gave an example of what would happen in the presence of any change. Let's presume that /LINES is something that was added to READ that the person implementing the READ actor for HTTP didn't know about:

r3-alpha>> read/lines http://example.com  
== #{
3C21646F63747970652068746D6C3E0A3C68746D6C3E0A3C686561643E0A2020
20203C7469746C653E4578616D706C6520446F6D61696E3C2F7469746C653E0A

If the refinement definitions live in a single GENERIC prototype definition... then this "handle those that are applicable" premise falls through the cracks rather quickly. /LINES was very much applicable to this case, but the HTTP scheme didn't know about it.

It should either handle it, reject it, or it's a bug.

You can adapt the way the functions are defined within the actors collection to apply more rigour, it doesn't really change the premise.

Okay, at least one point of agreement. But...

  • Is /LINES the kind of thing that should be on the interface of READ? When does the universal generic verb of READ stop accruing this potentially giant list of refinements?

  • Breaking things into lines seems like something that could be handled by a fallback if the actor didn't have its own optimized/streaming implementation. How should that work?

  • Do actors have the ability to "contribute" more refinements, or are changes managed purely by being compiled into the core? How would these make it through to HELP if they exist?

  • While the implementations may vary, it seems you at least think there is some agreed-upon arity for every word. And you think there's a good first draft of this list. I guess going over the list and explaining why--since these words can semantically mean literally anything--they have the given arity is at least ground zero for the religion.