Validating Subseries Data In PARSE... INPUT or INTO TAG!s?

If you want to validate a block of input and return that block, how do you do it?

By default, UPARSE gives you the last rule match:

>> uparse [1 2 <three> "four"] [some integer! tag! text!] 
== "four"

You can bend that last-result behavior to your advantage by invoking a rule that returns the input. The tag! combinator <input> does exactly that!

>> uparse [1 2 <three> "four"] [some integer! tag! text! <input>] 
== [1 2 <three> "four"]

Pretty slick. But what if it's a nested block? Does <input> give you the INTO series, or the original series?

Right now it gives you the INTO series. e.g. what <input> returns is whatever the currently parsed input is:

>> uparse [zero [1 2 <three> "four"]] [
      word: word!
      validated: into block! [some integer! tag! text! <input>]
      ("some overall result")
== "some overall result"

>> validated
== [1 2 <three> "four"]

>> word
== zero

That's nice, but... you can also imagine being inside some nested rule like this where you want to make a decision like return <input> which wants to imply accepting the original input to the parse.

Should <input> always return the overall parser input, and a separate rule like <into> give back the current sub-input?

  • At a combinator level, the currently processed argument is always called INPUT. So calling it <input> and returning the currently applicable input is consistent with the implementation.

    •'s actually only partially consistent. Because the INPUT to each combinator actually is at the current position. So it's more like HERE.

      • I'm actually not that bothered by this
  • INTO is not the only combinator that we can conceive of can that can go to a nested level. So calling the tag <into> might not be a good idea. Also, that doesn't have the ring of generality to return the input at wherever you currently are...e.g. return the main input if you haven't done an INTO

    • Shades of meaning are difficult here with other words, as <current> (for instance) is hard to distinguish from <here>... e.g. you'd think it would include the position.

In any case...I lean toward thinking <input> reflecting the current input is the best answer. I think this suggests a special term for the main input, like <main-input>. A better choice for this would maybe be <original>.