Parsing Giant Streams Without Consuming Tons of Memory: How?

hostilefork · September 2, 2021, 5:37am

An Additional Note On the Practicality Of Discarding Input

I painted a picture above of how SOME could become more participatory in helping PARSE avoid being responsible for keeping around all the past input in a parse stream for all time.

The rule I was examining looked like this:

parse p [
    some [some ["A" | "B"] newline (count: count + 1)]
]

What if your rule was instead:

parse p [
    some [
        some ["A" | "B"] newline
        (count: count + 1)
    ]
    |
    some-other-rule
]

The SOME may know it's never going back after a match it makes, for each line. And it may effectively communicate that through its input consumption (with the promise it will be responsible for giving back any input it decides it does not like).

But since there's an alternate rule coming up, the PARSE can't use that knowledge to let the information go that the SOME claims to be done with (or responsible for giving back). Because the "some other rule" could run if the SOME ultimately doesn't match...requiring backtrack all the way to the beginning state before the SOME.

This points out that | in a BLOCK! rule is basically a SEEK operation. It's a "seek the position before the last alternate grouping" rule. If the block rule is at a position and wants to know if it can throw out data it's streamed before the current point, it only needs to look forward and see if there are any | in the block ahead of it, and if so... it can't throw out the data.

But coming to the rescue is ||...which I have called the inline sequencing operator. If the BLOCK! combinator looks ahead and sees one of those, then it knows any | after it aren't relevant to whether it can throw out the past. The current rule will have to complete on the input it is given or things will fail.

All this may sound too tricky to be worth it, but this is a topic that interests me...and I have to stay interested, right?