Monitoring a single function with ENCLOSE and HIJACK

Have you ever wanted to follow the life and times of a single function, to see its input and output in action? With ENCLOSE and HIJACK, you can...

There's some C code that implements a native called SCAN-NET-HEADER, which I'd like to see ported to usermode. But I wanted some examples of what it did.

First let's make a copy of scan-net-header:

snh-copy: copy :scan-net-header

The reason we want to make a copy is because we plan to hijack existing instances of the function...and we want to tunnel that back into the implementation. But if we don't have a copy, our attempt to pass the information through would just get hijacked again...and there'd be an infinite recursion.

(Note: making copies of a function is a relatively lightweight process, the bodies of non-natives are not deep-copied, for instance.)

Now let's make a function that takes the same parameters as our copy of SCAN-NET-HEADER, but can run some code before and after it. That's what ENCLOSE is for:

snh-traced: enclose 'snh-copy function [f [frame!]] [
    print "==Input to SCAN-NET-HEADER=="
    probe f

    print "==Calling SCAN-NET-HEADER=="
    result: do f

    print "==Result of SCAN-NET-HEADER=="
    probe result

    return result

Now, let's HIJACK the SCAN-NET-HEADER function with our new, compatible function (itself not dependent on calling SCAN-NET-HEADER, just a copy). All existing calls will be routed through our tracer:

hijack 'scan-net-header 'snh-traced

One place this gets called is in HTTP requests, so why not the forum? Since you're probably typing this in a console prompt where there's no previous value, you can use the new ELIDE to say you're not interested in the giant BINARY! return result...

>> elide read

The argument is revealed to be a really huge blob of binary data. But the result comes out as:

==Result of SCAN-NET-HEADER==
[Server: "nginx" Date: "Tue, 19 Dec 2017 01:59:58 GMT" Content-Type: "text/html; charset=utf-8" Connection: "close" Vary: "Accept-Encoding" X-Frame-Options: "SAMEORIGIN" X-XSS-Protection: "1; mode=block" X-Content-Type-Options: "nosniff" X-Discourse-Route: "categories/index" Cache-Control: "no-store, must-revalidate, no-cache, private" X-Request-Id: "e504ca54-5429-4427-a744-4c9ce20d51e7" X-Runtime: "0.088665" X-Discourse-TrackView: "1" Strict-Transport-Security: "max-age=63072000"]

We can change our instrumentation of the calling parameters slightly, from probe f to probe to-string f/header, and see what that was. It turns out to be the entire source of the page:

==Input to SCAN-NET-HEADER==
Server: nginx^M
Date: Tue, 19 Dec 2017 02:04:51 GMT^M
Content-Type: text/html; charset=utf-8^M
Connection: close^M
Vary: Accept-Encoding^M
X-Frame-Options: SAMEORIGIN^M
X-XSS-Protection: 1; mode=block^M
X-Content-Type-Options: nosniff^M
X-Discourse-Route: categories/index^M
Cache-Control: no-store, must-revalidate, no-cache, private^M
X-Request-Id: 230f30b1-c04b-404c-9bb2-85cec586f573^M
X-Runtime: 0.067961^M
X-Discourse-TrackView: 1^M
Strict-Transport-Security: max-age=63072000^M
<!DOCTYPE html>
<html lang="en" class="desktop-view not-mobile-device  anon">
    <meta charset="utf-8">

So clearly it was making a decision to go up through the lone carriage return to stop processing the key and value pairs.

How about that? In only a few minutes I was able to instrument and inspect a function. I could have put in arbitrary processing to limit the range of function calls in which I was interested, or logged and filtered the data instead of dumping it in-band in my program.

Does anyone want to reward all this hard work by writing a usermode SCAN-NET-HEADER for me, so that UTF-8 Everywhere can be finished faster? :slight_smile: