Running into issues with PARSE

RayMPerry · September 13, 2021, 3:28am

Hi. I'm new to Ren-C and have been trying to get this simple parser to work.

(For context, MOCK_DATA.csv contains 1000 rows of "First Name, Last Name, Email, Date of Birth".)

The questions I have are:

How do I read this error?
Where should I be looking for valid words/syntax?

Thanks in advance.

hostilefork · September 13, 2021, 1:35pm

Hello, and welcome.

What you've hit above is a bug in the trick that is used to make "old style" PARSE synthesize a value. You typically write with a variable to assign to, like this:

parse [1] [x: collect [keep integer!]]

But in order to get the overall result to be the collect result for the following particular stylistic case:

parse ... [collect [...]]

It was effectively transforming it to:

let temp
parse ... [temp: collect [...]]
temp

This has the problem that if your rule fails, then temp never got set.

I've fixed the bug:

Make COLLECT trick work if rule fails

But Why Is The Rule Failing

Your rule is failing, because it's not reaching the end of the input. (This is different from Red, which doesn't check for reaching the end of input when you use COLLECT.) You'd need to say:

some [keep field-rule ["," | end]]

Otherwise it will expect a comma after every element--even the last.

Why Wasn't The Bug Found Earlier?

PARSE is being overhauled, and that idea (of a COLLECT being the return result of the parse overall) was never how it worked. Initially you had to name a variable, like:

parse ... [collect var [...rule...]]

We are evolving to a completely new model in which every rule can synthesize a result, and they are set with SET-WORD!:

parse ... [var: collect [...rule...]]

The longer goal is that the overall return result of the parse operation is whatever is synthesized by the rule. WHILE and SOME just return the result of their last successful iteration.

Here you can see that with the new UPARSE prototype:

>> uparse "aab" [while ["a" (10) | "b" (20)]]
== 20

Note literal rules evaluate to themselves, which can also be useful:

>> uparse "aab" [while ["a" | "b"]]
== "b"

So what you saw was just a little very recent patch onto the old PARSE (being called PARSE3 for Parse R3-Alpha) to try and make it act a little more like UPARSE. It was tested superficially, but as you discovered it hadn't been tested on failure.

Best Place To Look For Information

Where should I be looking for valid words/syntax?

The PARSE story here right now is that UPARSE combinator-based design is the focus:

Introducing The Hackable Usermode PARSE ("UPARSE")

It's a work in progress, and being usermode glacially slow...but it will be rewritten once the design is pinned down.

The tests folder has per-combinator tests, but the newer a feature is the more likely it will be changed:

https://github.com/metaeducation/ren-c/tree/master/tests/parse

Interestingness Abounds

Just a small point on how interesting things are... I mentioned that Red's parse won't check that you reached the end of the input:

red>> parse [1 2 3 <foo>] [collect [some keep integer!]]
== [1 2 3]

But that UPARSE would:

ren-c>> uparse [1 2 3 <foo>] [collect [some keep integer!]]
; null

Ren-C however has something called ELIDE. It's like comment, but it vanishes fully:

ren-c>> 300 + 4 elide x: 1000 + 20
== 304

ren-c>> x
== 1020

There is an ELIDE combinator in PARSE as well. It means "match this, and fail if it doesn't match, but don't affect the accumulated product".

ren-c>> uparse [1 2 3 <foo>] [collect [some keep integer!] elide tag!]
== [1 2 3]

You could also be more generic and elide everything to the end (TAG!s are used in UPARSE for things like <end> to keep words like END free for variables:

ren-c>> uparse [1 2 3 <foo>] [collect [some keep integer!] elide to <end>]
== [1 2 3]

Running into issues with PARSE

But Why Is The Rule Failing

Why Wasn't The Bug Found Earlier?

Best Place To Look For Information

Interestingness Abounds

Further Reading on Differences from Red