Speed of UPARSE

hostilefork · March 29, 2024, 2:34pm

Several orders of magnitude slower. When I first tinkered with nativizing it, it was about 700x slower just on some arbitrary case:

Progress on Nativizing Parser Combinators

But, an hour of tinkering at the time got it to 250x slower.

However, predictably that tinkering was quickly out of date, so that early nativization is inactive. Things aren't set up to maintain nativized code in parallel to the usermode code, and it would be too much work to be justified until fully stable.

The baseline will be even slower today... because binding is slower, and because there's more hooking in it which was added for the debugging demo.

If you watch the debugging video, it explains that the debugger works because each call to a combinated parser can be hooked... such that the hook is responsible for invoking the frame. It can inspect the frame beforehand, and examine the multi-return result after it does the invocation. (Or it can skip the invocation entirely, or duplicate the frame and invoke it twice, or mutate the frame before it runs it, etc.) Each combinator just directly calls its subparser, but the subparser has been ENCLOSE'd with a wrapper:

 wrapper: func [
    "Enclosing function for hooking all combinators"
    return: [pack?]
    f [frame!]
][
    return either f.state.hook [
        run f.state.hook f
    ][
        run f
    ]
]

So that's overhead on each call to a parser that any other parser makes. Of course the pattern of enclose :combinator :wrapper itself could be partially nativized in a semi-generic way, as something like:

hookify :combinator 'f [f.state.hook]

But these things are patterns which should inform how to design the system in a more general sense. I am trying to get some kind of story together for how dialects are debugged... as I've said, it's sort of like you make a choice if you want to take "assembly level stepping" (e.g. debug the Rebol instructions) or if you want to debug at the higher level of the dialect. So I expect this hook to be a sunk cost of some kind.

Right now I'm doing other things, though (as well as non-Rebol-related life stuff, which is going to mean Rebol development will get a bit more sporadic than it was in the first couple of months of the year.)

Nativizing Plan

Ultimately, what I plan to do is make it so that all the combinators are in their own C files, where the usermode form is in a comment, something like:

//
// File: %src/core/parse/optional-combinator.c
//

/* <BEGIN USERMODE COMBINATOR>

'optional combinator [
    "If applying parser fails, succeed and return NULL; don't advance input"
    return: "PARSER's result if it succeeds, otherwise NULL"
        [any-value? pack?]
    parser [action?]
    <local> result'
][
    [^result' remainder]: parser input except e -> [
        remainder: input  ; succeed on parser fail but don't advance input
        return null
    ]
    return unmeta result'  ; return successful parser result
]

</END USERMODE COMBINATOR> */

//
//  optional-combinator: native/combinator [
//
//  "If applying parser fails, succeed and return NULL; don't advance input"
//
//      return: "PARSER's result if it succeeds, otherwise NULL"
//          [any-value? pack?]
//      parser [action?]
//  ]
//
DECLARE_NATIVE(optional_combinator)
{
    INCLUDE_PARAMS_OF_OPTIONAL_COMBINATOR;

    Value* remainder = ARG(remainder);  // output (combinator implicit)

    Value* input = ARG(input);  // combinator implicit
    Value* parser = ARG(parser);
    UNUSED(ARG(state));  // combinator implicit

    enum {
        ST_OPT_COMBINATOR_INITIAL_ENTRY = STATE_0,
        ST_OPT_COMBINATOR_RUNNING_PARSER
    };

    switch (STATE) {
      case ST_OPT_COMBINATOR_INITIAL_ENTRY :
        goto initial_entry;

      case ST_OPT_COMBINATOR_RUNNING_PARSER :
        goto parser_result_in_out;

      default : assert(false);
    }

  initial_entry: {  //////////////////////////////////////////////////////////

    Push_Parser_Sublevel(OUT, remainder, parser, input);

    STATE = ST_OPT_COMBINATOR_RUNNING_PARSER;
    return CATCH_CONTINUE_SUBLEVEL(SUBLEVEL);

} parser_result_in_out: {  ///////////////////////////////////////////////////

    if (not Is_Raised(OUT))  // parser succeeded...
        return OUT;  // so return its result

    Set_Var_May_Fail(remainder, SPECIFIED, input);  // convey no progress made
    return Init_Nulled(OUT);  // null result
}}

This way the two can be maintained in parallel, and a debug mode could switch between them to make sure they produce identical results in the tests.