A long time ago, @Brett converted the circuitous native code for TRIM from R3-Alpha to PARSE-based usermode code.
Since we have that code--and some tests for it--I thought it would be a good idea to go ahead and try running it under UPARSE. This would be another way of testing UPARSE...as well as to see if the new features gave it any kind of leg up. We could also look for inspirations for new features...
New Features: <index>
and MEASURE Combinators
There was a calculation of indentation done for the TRIM/AUTO feature. It uses PARSE* which is the version that doesn't require matching to the end of the input. (Though since it doesn't check the result and doesn't do any operations which would roll back, it doesn't make a difference.)
indent: _
if auto [
parse* series [
; Don't count empty lines, (e.g. trim/auto {^/^/^/ asdf})
remove [while LF]
(indent: 0)
s: <here>, some rule, e: <here>
(indent: (index of e) - (index of s))
]
]
The first thought I had is that with TAG! combinators, though we lost the ability to match TAG!s without a quote like [some '<tag>]
...we have a nice noun-space to play with that doesn't interfere with variable name nouns. So what if <index>
gave you the index position in the current series?
That makes it a bit nicer:
indent: _
if auto [
parse* series [
; Don't count empty lines, (e.g. trim/auto {^/^/^/ asdf})
remove [while LF]
s: <index>, while rule, e: <index>, (indent: e - s)
]
]
I also changed the SOME to a WHILE, which always succeeds...and since <index>
always succeeds there's no need to pre-emptively set the indent to 0.
But wouldn't this pattern make a nice combinator in and of itself? Something that can tell you how long a matched range is. Well, uparse fans, meet MEASURE!
indent: _
if auto [
parse* series [
; Don't count empty lines, (e.g. trim/auto {^/^/^/ asdf})
remove [while LF]
indent: measure while rule
]
]
And look how easy the combinator is to write (it's one of those that can just use the default rollback):
measure: combinator [
{Get the length of a matched portion of content}
return: "Length in series units"
[<opt> integer!]
parser [action!]
<local> s e
][
([# (remainder)]: parser input) else [return null] ; ignore result
e: index of get remainder
s: index of input
if s > e [ ; could also return something like ~bad-seek~ isotope
fail "Can't MEASURE region where rules did a SEEK before the INPUT"
]
return e - s
]
That's A Pretty Good Start!
It seems to me that what the TRIM code needs is probably a bit better definition of the semantics. TRIM/AUTO is a bit strange:
>> utrim/auto " x^/ y^/ z^/"
== "x^/ y^/ z^/"
It indents relative to the first non-newline-line...but that creates an issue of what to do about the line that comes after it which is less indented. The rule for processing lines was:
line-start-rule: compose/deep [
remove [((if indent [[opt repeat (indent)]] else ['while])) rule]
]
The indent not being a BLANK! implies TRIM/AUTO.
That's a /DEEP compose that does splicing (signified these days by ((...))
. I rewrote the rule to be a bit clearer as:
line-start-rule: compose [
remove (if indent '[opt repeat (indent) rule] else '[while rule])
]
That's more pleasing to me, as well as more efficient. It's a nice use of the quoted branches!
But back to the semantics: is this right? It could also slam the less indented lines to the left by moving the OPT.
line-start-rule: compose [
remove (if indent '[repeat (indent) opt rule] else '[while rule])
]
That would make the y
flush with the left:
>> utrim/auto " x^/ y^/ z^/"
== "x^/y^/ z^/"
Anyway... let's keep those UPARSE test cases coming! It's to a point now where UPARSE is more reliable than R3-Alpha-derived native PARSE (I'm calling PARSE3) and Red. So it's revealing the bugs and inconsistencies in those codebases, not vice versa.