Deciding on an Alternative Comment Syntax

hostilefork · August 29, 2017, 7:13am

UPDATE: After extensive consideration of alternative comment syntaxes, ; was retained...with a suggested policy of spacing line comments off by two spaces, not one:
my [line of] code  ; like this

my [line of] code ; not this

my [line of] code ;; or this

my [line of] code ;-- or this
Various factors made this choice seem best. The rise of in-language invisibles made it possible to create your own comment structures...even using things like -- and // to ignore to end of line...so long as what you wanted to ignore was LOADable. This aligned with the general goal of giving the most symbols possible to the language, vs. to something that would throw data out and be impossible to use.

The visual compromise of two spaces off for same-line comments addressed enough of the complaints about ; to be palatable. Further, it should be noted that at least one space is required to consider a comment to be a comment. This allows ; to be a character in ISSUE! (e.g. #;) or in URL!, where it is legal. It's not a legal word character, so abc; triggers an error.

One of the things that Rebol has wanted to go after is the idea that you don't need an IDE or syntax highlighting to feel the code is readable or looking pleasing.

(I've felt like there is something of a similarity of goals here with MarkDown; that the idea you could pull the code out of an old-school typewriter and feel the lack of any kind of rendering engine isn't a liability.)

On this note, the use of semicolons for comments in Rebol has never felt good to me. Semicolons standing alone almost look like dirt; they're too light and broken-up to feel like a good "barrier".

some (code [that looks]) kind of Englishy ; a Rebol comment

The solutions that have come along feel like they're making the problem worse, and I resent having to "make something up" to tack on more characters that are optional as decoration to make up for how weak semicolon is:

double (semicolons [are also]) broken-dirty ;; a Rebol comment

semicolon (and [dashes aren't]) really better ;-- a Rebol comment

Historically, semicolons come from Lisp and its family...as well as many assembly languages. Besides the complaints I give about it, I'm certain it looks quirky unnatural to the average programmer of today.

The two notably popular choices for end-of-line comments are // from many C-derived languages (originally a C++ invention, but added to C--which is saying something, because C has been very reluctant to take C++-isms)...and the # which is popular in shell/scripting languages. For a big survey of forms of comments, Rosetta Code has it covered:

Languages targeting people who are thinking more about compiling things--not just Java but even more recent ones like Rust, Scala, Haxe--have gone with with the C/C++ conventions. (Of course, JavaScript went with it too, for those of you who don't know.)

Meanwhile languages that seem to be more in the script space have gone with #...but not just the usual suspects like Awk/Sed/Perl/PHP/YAML, but PowerShell, Ruby, Python, Julia...

I don't think repeating random mistakes because they are popular is a good idea. But there seems to be a kind of visual truth about these stronger separating choices. When I have to use some other language and then come back to Rebol and accidentally start commenting with // or #, I think about how much better the resulting code looks.

(I encourage others to experiment and see how you feel.)

There's obviously an issue (ha) with using # with no space after it as a comment, since that is how ISSUE! is denoted. I don't really feel like requiring a space after it to get comment behavior is so bad, as #this looks bad anyway. Though repeating # should be legal, to kind of draw separators, e.g.

######### This should be able to be a comment ##########

It feels much more solid than:

;;;;;;;;;; This terrible abomination ;;;;;;;;;;;;;;;;;;;;

Sometimes I think # is perhaps a bit too heavy, and the // reminds me of one of my historically favorite marking patterns... standardized by OSHA to mean "physical hazard":

////////// This is actually kind of cool /////////////////

So I've got some mixed feelings personally about liking //, and also thinking // makes for a pretty poor Rebol WORD! (try putting it in a PATH!) so it would be nice to make it illegal there--which declaring it a comment would do. But it seems when you look at the group of languages Rebol is most aiming at that # makes more sense, as well as being able to accomplish the goal of the desired barrier in a single character instead of two.

There's also the need to support shebang in some way, and it would be easier if things like #! were not issues but treated as comments...which could go with the ## types of exceptions above.

gchiu · August 29, 2017, 7:34am

Anything before the Rebol [] block is ignored so it doesn't matter much if you have a shebang at the top of your script.

Some of us are used to using \ as a line comment indicator but I guess # is familiar to the audience we are addressing.

johnk · August 30, 2017, 4:52am

+1 on using # for comments. Good visual separation and familiar to many users of other scripting languages.

hostilefork · August 30, 2017, 6:23am

This issue has been discussed a number of times, and so far not everyone has stated support of it, in the sense that they'd be switching from semicolon to use it. But I can't recall anyone actively opposing it. The // has been opposed by people who don't think it fits the "language class" to which Rebol belongs, and there has been defense of // as an interesting operator...while no suggestions of making # into a WORD! operator have been made.

We should note that R3-Alpha tried to use # to mean a NONE! literal (as opposed to, say, an empty ISSUE!):

r3-alpha>> type? #
== none!

This isn't something many people knew about, and I don't think anybody who knew about it liked it. The underscore for BLANK! makes more sense for that, I think. Treating a lone # as a blank hasn't been removed from Ren-C yet, but it should have been:

ren-c>> #
== _

When it's removed, I don't think it will affect much code, since there aren't likely a lot of uses in practice.

Red did not carry forward R3-Alpha's decision, and merely considers it an invalid ISSUE! at this time:

red>> #
*** Syntax Error: invalid issue! at "#"
*** Where: do
*** Stack: load

If nobody speaks out against it, I'm going to go ahead and do it. We can start with the policy that any number of # followed by a space indicates an end-of-line comment and see if that is enough.

Brett · August 30, 2017, 7:38am

and ... any number of # followed by a newline
I take it this will be in addition to the semicolon for now
Thinking about how one performs search and replace on existing scripts to update to the new comment style - it's not mandatory obviously, but I wonder if Rebol needs a LOAD mode that can load all Rebol tokens from a file, including comments and whitespace formatting, problem then being how one can identify them...

rgchris · August 30, 2017, 2:42pm

As you might imagine, I think I'll be on the list of hold-out contrarians. In my opinion it clashes with values that use it and that visual distinction there is paramount.

hostilefork · August 30, 2017, 3:38pm

While semicolon does have "not used by other things" going for it, that's pretty much all it has. In a language where a:b is a URL! and has to be read separately from a: b or a :b, space significance is pretty important.

unless (we) #use [UTF-8] 💬 multiple use of chars is just life

The worst case scenario would look like

#some #bunch #of #issues # Hey look it's some issues.

In the scheme of things, I don't find that as hard to parse out as a lot of stuff I see every day in Rebol. As worst-case scenarios go, it's pretty tame. And going back to my initial example, I think it stands out more than:

 some (code [that looks]) kind of Englishy ; a Rebol comment

rgchris · August 30, 2017, 5:17pm

I'm not so sure, I think the worst case scenario is that every line looks like it has issues in it:

something: function [ ][ # start
    foo: #bar # important
    collect [ # collect
        keep #foo # foo part not #baz
        keep foo # bar part
        keep #{DECAFBAD} # requisite binary
    ] # end collect
    #and also debugging if you happen to forget a space...
] # end

Ok a bit contrived, but is a worst case scenario after all.

I think of semi-colons as ushering an aside:

something: function [ ][ ; start
    foo: #bar ; important
    collect [ ; collect
        keep #foo ; foo part not #baz
        keep foo ; bar part
        keep #{DECAFBAD} ; requisite binary
    ] ; end collect
    ;and also debugging if you happen to forget a space...
] ; end

Like whispering behind the back of your hand.

hostilefork · August 30, 2017, 6:54pm

Argh. Well a good point there. I think both of the above look pretty bad, and it makes me wish for //.

something: function [ ][ // start
    foo: #bar // important
    collect [ // collect
        keep #foo // foo part not #baz
        keep foo // bar part
        keep #{DECAFBAD} // requisite binary
    ] // end collect
    // would you *always* need a space? (URL!s embed //)
] // end

For comparison, double-semicolon:

something: function [ ][ ;; start
    foo: #bar ;; important
    collect [ ;; collect
        keep #foo ;; foo part not #baz
        keep foo ;; bar part
        keep #{DECAFBAD} ;; requisite binary
    ] ;; end collect
    ;;no space required
] ;;end

Given that many non-C-like languages use it, it can't be completely my C++ bias that thinks that's visually decent.

In fact, I think your example is compelling enough I'm going to switch back to // as my vote for alternate-end-of-line-comment, which it was historically (before I was editing some Travis YAMLs a bit and decided # was kind of pleasing there)

Technically speaking you don't need the space after the //, just before it, for URL! to work. Otherwise http://thiswouldlooklikeacomment.com. But maybe for safety requiring a space after is good too. It's good practice for legibility, anyway.

//=//// THOUGH IN REBOL'S SOURCE I DO STUFF LIKE THIS ///////////////=//
//
// Because I think it makes for nice visual blocks of comment
//
//=//////////////////////////////////////////////////////////////////=//

hostilefork · September 6, 2017, 9:59pm

-- Because it came up that some languages use "--"
-- it is interesting to consider what that would look like
--
something: function [ ][ -- start
    foo: #bar -- important
    collect [ -- collect
        keep #foo -- foo part not #baz
        keep foo -- bar part
        keep #{DECAFBAD} -- requisite binary
    ] -- end collect
    --would spaces be necessary, (would --would be a word?)
] -- end

We apply "--" today as an enfix decrement operator, e.g. foobar: -- 5 as equivalent to foobar: foobar - 5. But it does make for a pleasingly light-yet-separating comment.

Today's approximation is ;--

;-- Because it came up that some languages use "--"
;-- it is interesting to consider what that would look like
;--
something: function [ ][ ;-- start
    foo: #bar ;-- important
    collect [ ;-- collect
        keep #foo ;-- foo part not #baz
        keep foo ;-- bar part
        keep #{DECAFBAD} ;-- requisite binary
    ] ;-- end collect
    ;--no spaces necessary
] ;-- end

Which I guess, all things considered, is not that terrible, but still noisy and three-characters-plus-space to start a comment is a bit heavy handed.

gchiu · September 16, 2017, 9:36am

Since -- enfix is likely to be used slightly vs -- as a comment marker, why not make the enfix operator something else like -= ?

hostilefork · September 16, 2017, 6:20pm

Well, as we know I'm an admirer of Haskell, it seems that Ada was a thought out language also in many ways. And as we've seen, people do ;-- often to try and make up for semicolon's weakness.

I think the -- operator is not used often enough to rule out this usage, when commenting is so fundamental.

Let's hold this thought.

draegtun · September 17, 2017, 5:02pm

Yes. So -- is a no for me because I don't think any comment syntax should use valid word chars.

This means // is OK (as is \\) for commenting. However I'm still very happy with semi-colon. But it would be nice for a true multi-line comment syntax. So perhaps?...

\\commented out//

\\
    multiline comment
//

BrianOtto · January 10, 2018, 10:38am

I really like the -- syntax (with a space required after it), but I am partial to it because I have been designing my own language for a while now, and it uses something similar (I use 6 dashes, but that's a whole other story). Anyway, I spent days trying out different combinations of characters, and dashes were one of the few characters that really make comments stand out, and they look nice too.

I think changing the enfix syntax, like @gchiu suggested, is a good compromise. Or what do you think about using 3 or 4 dashes to differentiate them? Yea, it's more to type, but I'm of the opinion that if you're writing comments then some effort should be involved, and so you should spend time doing it properly. Having to type more characters makes you think about this and nudge you in the direction of quality over typing something quick and useless. It also makes the comments stand out more from the actual code.

In addition, what do you think about a dashes + block syntax for multi-line comments?

---- [
    Because it came up that some languages use "--"
    it is interesting to consider what that would look like
]

something: function [ ] [
    ---- foo has something important to say,
    ---- see how this stands out from the actual code
    foo: #bar

    ---- collect the values
    collect [
        keep #foo         ---- foo part not #bar
        keep foo          ---- #bar part
        keep #{DECAFBAD}  ---- requisite binary
    ] ---- end collect
] ---- end

hostilefork · January 10, 2018, 3:54pm

It looks good. We definitely should think about this.

Still on the table. I think @rgchris had a good argument against #. I try // now and again, and I guess I haven't completely been sold on it, even though it's used elsewhere.

hostilefork · January 22, 2018, 10:42am

There's a lot of pushback on this from people who want to use -- in identifiers. @draegtun voiced concern above, the usual peanut gallery of Rebol2 people don't like it.

I guess the real thing to think about is that while it looks nice, it leads to a situation where you either disallow things like --foo and foo-- and --foo--, or you run the risk of making something that is very easily mistaken for a comment. (Or you mean to type a comment and wind up with a WORD! instead.) It seems unsafe and confusing to have it as a comment and not forbid those things.

Yet if you look at things like command line switches in UNIX, they frequently look like --foo. That's a fairly legitimate use of a WORD! and dialecting to talk to command-line things.

On the other hand, if we look at //, here you have something that is lousy as a WORD! anyway (you can't put it in a PATH! easily). No one is clamoring for //foo// as a WORD! exemption, or protesting loudly for ///. It has clout in popular languages as a comment marker...perhaps not as "academic pedigreed" as those using --, but certainly known to more users.

Being able to lay out a command-line dialect is probably the Achilles Heel for dashes as comment for me. So I'll re-center my vote back to slashes. Not quite as "light" feeling, but the desirable qualities of dashes make them a commodity for non-comment purposes.

hostilefork · January 22, 2018, 12:54pm

~~~~ [
    @giuliolunati pointed out that there is also "~~"
    which may have some of the benefits of -- while
    being more "fringe" and not used much.
]

something: function [ ] [
    ~~~~ foo has something important to say,
    ~~~~ see how this stands out from the actual code
    foo: #bar

    ~~~~ collect the values
    collect [
        keep #foo         ~~~~ foo part not #bar
        keep foo          ~~~~ #bar part
        keep #{DECAFBAD}  ~~~~ requisite binary
    ] ~~~~ end collect
] ~~~~ end

It's awfully close to --, yet still the squiggliness makes it feel a little...messy. Un-confident, wishy-washy perhaps? We often associate this shape these days with spelling errors in what you're typing.

rebolek · January 22, 2018, 1:30pm

This debate perfectly illustrates my problem with Ren-C.

Just because @hostilefork does not like ; as comment and calls it "terrible abomination", there is debate what other character to use. This is not fixing some bug, neither it is improvement to the language (although some of previous fixes or improvements may be very questionable IMO), it just wants to replace ; use with some of the perfectly valid Rebol characters, therefore limiting Rebol's lexical space.

I know, this is not Rebol, it is hostile fork of Rebol called Ren-C, but anyway. It is not a fix, not an improvement, it's just another example of @hostilefork's hate for all things Rebol.

BrianOtto · January 22, 2018, 7:18pm

I can't speak for the other updates that have been made, but in this case, I respectably disagree. The suggestions discussed here and in the chat are definitely an improvement over the semi-colon. Code is read many more times than it is written, and anything that makes them more readable and stand out from the code the better. They should be aesthetically pleasing

I don't think a lot of language designers think about comments and they tack on whatever way they're used to, from the last language they worked in. I think it is very worthwhile to explore this, even if it does mean that we end up using //, like most everyone else, due to the constraints imposed by other parts of the language.

BrianOtto · January 22, 2018, 7:21pm

That's too bad, I was really rooting for this. I am going to think about this some more. I have one other idea I'll share a little later today. Been a really busy morning for me, and can't get into it right now.