Validating Subseries Data In PARSE... INPUT or INTO TAG!s?

hostilefork · October 4, 2021, 3:11pm

If you want to validate a block of input and return that block, how do you do it?

By default, UPARSE gives you the last rule match:

>> uparse [1 2 <three> "four"] [some integer! tag! text!] 
== "four"

You can bend that last-result behavior to your advantage by invoking a rule that returns the input. The tag! combinator <input> does exactly that!

>> uparse [1 2 <three> "four"] [some integer! tag! text! <input>] 
== [1 2 <three> "four"]

Pretty slick. But what if it's a nested block? Does <input> give you the INTO series, or the original series?

Right now it gives you the INTO series. e.g. what <input> returns is whatever the currently parsed input is:

>> uparse [zero [1 2 <three> "four"]] [
      word: word!
      validated: into block! [some integer! tag! text! <input>]
      ("some overall result")
 ]
== "some overall result"

>> validated
== [1 2 <three> "four"]

>> word
== zero

That's nice, but... you can also imagine being inside some nested rule like this where you want to make a decision like return <input> which wants to imply accepting the original input to the parse.

Should <input> always return the overall parser input, and a separate rule like <into> give back the current sub-input?

At a combinator level, the currently processed argument is always called INPUT. So calling it <input> and returning the currently applicable input is consistent with the implementation.
- Well...it's actually only partially consistent. Because the INPUT to each combinator actually is at the current position. So it's more like HERE.
  - I'm actually not that bothered by this
INTO is not the only combinator that we can conceive of can that can go to a nested level. So calling the tag <into> might not be a good idea. Also, that doesn't have the ring of generality to return the input at wherever you currently are...e.g. return the main input if you haven't done an INTO
- Shades of meaning are difficult here with other words, as <current> (for instance) is hard to distinguish from <here>... e.g. you'd think it would include the position.

In any case...I lean toward thinking <input> reflecting the current input is the best answer. I think this suggests a special term for the main input, like <main-input>. A better choice for this would maybe be <original>.