Previously I brought up the idea of "TAG! combinators". Here's what I said, moved from another thread where it was kind of a tangent:
[I saw Haskell...] uses EOF instead of END. END is literate, but one often wants to call variables things like "begin" or "end", or "start" and "end".
This makes me wonder if perhaps we should be a bit more creative in the use of datatypes. If you want to match a WORD! in a dialect, you have to use a tick mark. What if you had to use a tick mark to match TAG!s, and then an ordinary TAG! could have meaning as a rule...such as
<end>
?parse "aaa" [data: copy to <end>] parse "<div>stuff</div>" [x: between '<div> '</div>]
Anyway, that could open up a whole new category of combinators... tag combinators. Maybe
<here>
is another example, or perhaps<input>
if you want to pass the original input position through to a function.A unifying concept here could be that you'd use it for properties that you don't want to have collide with the names of variables. Consider for example if PARSE tracks the line number, you might want to say something like
line: <line>
in the middle of a rule.If you want to match tags by their stringness, it's not like it's all that hard to just say
"<div>"
in the first place. But quoting is even briefer. Remember that being inert in typical evaluation is not enough in PARSE to mean it's not a rule... INTEGER!, BLOCK!, BLANK! (previously NONE!) and now LOGIC! all have to be quoted to mean their actual literal thing. And quotes are needed on things like WORD!, GROUP!, GET-WORD!, SET-WORD!...and much more.So is it worth it to get another dialect part, by making you have to quote your tags if you want them to match literally? I kind of feel like it would be. Of course, the concept with UPARSE is that people could disagree and make entirely different answers...
(Note: a downside here is that since TAG!s are strings and not symbols, the comparison costs could be (slightly) higher. However, I've been thinking that to speed up string comparisons they might cache a symbol as part of the comparison process...and clear the symbol cache on each mutation. Then comparisons of strings to symbols could become very fast...so long as the string isn't changing. Wouldn't help if it were looked up in a map, but the optimized native version could do a fast check before hitting the map.)
I've decided this is too good an idea to pass on.
I particularly like that it makes a new namespace for nouns. It's mean to take away words like "end" from "start/end" or "begin/end" that people might want to use for variables.
So TAG! will not match string content in PARSE. If you want to literally match a tag in a block, use QUOTED! (as you would for other types):
>> parse [<a> <a> <a>] [some '<a>]
== <a> ; remember, rules like SOME don't synthesize anything
>> parse [<a> <a> <a>] [copy some '<a>]
== [<a> <a> <a>]
There's no particular reason not to have it work in strings too for finding the molded form:
>> parse "<a>stuff</a>" [between '<a> '</a>]
== "stuff"
But in a string, you can use it in quotes, maybe clearer:
>> uparse "<a>stuff</a>" [between "<a>" "</a>"]
== "stuff"
Isn't that block-result-is-the-result-of-UPARSE convention awesome?
Remember that Combinators are Customizable
If you don't like this idea, you can change it...but I think the TAG!-as-parsing-NOUN concept is something we'll get mileage from.
I mention <line>
and <file>
. It's nice to have these kinds of things not competing.
And I think it likely is going to look better for the likes of <end>
. I'm sympathetic that to <end>
is a bit more typing than to end but it seems pretty good.
Power Users Can Override It
parse: specialize :parse [
;
; Ugly way of extending a MAP!, there should be nicer ways.
;
combinators: append copy default-combinators reduce [
'end :default-combinators.<end>
'* :default-combinators.one
]
]
>> parse ['x "y" #z . . .] [word!, demo: *, issue!, to end]
>> demo
== "y"
But let's try starting to use it. Here are the changes to the tests to get an idea of how this looks.. I've left the old END and HERE and SKIP in for now, but I think we should move in this direction.
I also think getting <line>
and <column>
are pretty important, so that should get worked on...