Open Your Mind: A COLLECT in PARSE Meditation

The first time I used CALL, I griped that it was somewhat unusable (with a complaint that basically everyone else who tried it had, my complaint was far from unique). I would have thought that if I said call {r3 --do "print {Hello}"} that it would "do what I meant". Instead, what it did was try and find an executable file whose name (including spaces and quotes) was r3 --do "print {Hello}", and run it.

As crazy as that interpretation may seem on the surface, it actually came from a desire to not get involved in the details of a technical difference between Windows and POSIX. To make a long story short: Windows CreateProcess() has a command-line parser built in, so it notices the spaces and does the quote escaping...the call would pick apart and translate something like r3 --do "print \"hi\"" for you. POSIX does not do this, and expects you to pass in an array with elements r3, --do, print "hi" that has already been processed.

So there was this "scary" proposition of having to write a parser in POSIX to get this to work, and all the risks it entailed. How would you know you got it right, that you'd turned the \" into regular " and matched everything up right? CALL was already a daunting jungle of C code, how many points of failure would you want? Shixin was rightfully fearful (well, skeptical I should say) of getting into that business in C.

But we have Ren-C, and we can call it from C.

Free Your Mind

Here's how we now break those parameters up, if we happen to be on POSIX and /SHELL is not used:

parse-command-to-argv*: function [
    {Helper for when POSIX gets a TEXT! and the /SHELL refinement not used}
    return: [block!]
    command [text!]
][
    quoted-shell-item-rule: [  ; Note: ANY because "" is legal as a quoted arg
        any [{\"} | not {"} skip]  ; escaped quotes and nonquotes
    ]
    unquoted-shell-item-rule: [some [not space skip]]

    parse command [
        collect result: [any [
            any space [
                {"} keep quoted-shell-item-rule {"}
                | keep unquoted-shell-item-rule
            ]
        ]
        any space end]
    ] else [
        fail "Could not parse command line into argv[] block."
    ]
    for-each item result [replace/all item {\"} {"}]
    return result
]

The C code in the module just calls this helper in the POSIX extension implementing CALL as rebValue("parse-command-to-argv*", command). The helper resides in the module and is only visible to it, but the C finds it because when extension natives are loaded, they remember which module they were in, and this information factors into the binding.

It's a Zen moment, isn't it?

It may not be perfect (improvements welcome). And if it lets you down, you can always do what you used to...use CALL/SHELL and defer to the shell to do whatever-it-does with the text as its single argument. You'd need to anyway, if your call contained any ${ENVIRONMENT_VARIABLES} you wanted to substitute, or if you wanted to invoke "dir" or "echo" or other things that only exist in "sh" or "cmd.exe". (Of course if you run a program that way, you pay for the overhead of two processes invoked, vs. just one.)

But I think the main thing is just taking away that fear. Making the solution match the size of the problem--the so called "essential complexity". That's the goal of this exercise, and we are getting ever closer to it.

5 Likes

Wow. This is quite amazing.
:smiley:

1 Like