Much Ado About A Tiny Email Test "Micro-Dialect"

hostilefork · October 11, 2024, 9:29pm

So there was a "micro dialect" for email address scanning. A simple idea that you could intermix valid and invalid email addresses...to put groups of related emails together even though some in the group would fail and others would succeed.

It was something along these lines:

for-each [mode text] [
    + {email@example.com}
    + {firstname.lastname@example.com}
    - {email@example@example.com}
    + {email@subdomain.example.com}
    + {firstname+lastname@example.com}
    - {email@example.com}
    + {email@123.123.123.123}
    - {email@[123.123.123.123]}
    ...
 ][
    assert [mode: select [+ valid - invalid] mode]
    if (mode = 'valid) != test-scan-email (...) [
        fail ["Expected" @text "to be" an mode "email"]
    ]
 ]

But for this post, I threw in a couple of "hey that's neat" aspects, like:

>> an "valid"
== "a valid"

>> an "invalid"
== "an invalid"

Enter Dashed Strings

I don't know that the + and - markers were ever the greatest, but they certainly lost their appeal with dashed strings.

    + -{email@example.com}-
    + -{firstname.lastname@example.com}-
    - -{email@example@example.com}-
    + -{email@subdomain.example.com}-

At first I figured I'd just pick an alternative. There's Y and N...

    Y -{email@example.com}-
    Y -{firstname.lastname@example.com}-
    N -{email@example@example.com}-
    Y -{email@subdomain.example.com}-

Those are pretty big letterforms that blur together some. Tilde for trash to differentiate carries the connotation of "something wrong"...

    Y -{email@example.com}-
    Y -{firstname.lastname@example.com}-
    ~ -{email@example@example.com}-
    Y -{email@subdomain.example.com}-

...but it blurs here too much with the dash.

Really we can ask: why are we decorating the valid things, and not just the invalid things?

      -{email@example.com}-
      -{firstname.lastname@example.com}-
    # -{email@example@example.com}-
      -{email@subdomain.example.com}-

You could use N or * or # or any other nasty here, and it sort of stands out. Not as well as <bad> would.

      -{email@example.com}-
      -{firstname.lastname@example.com}-
<bad> -{email@example@example.com}-
      -{email@subdomain.example.com}-

But if you're going to break the regularity of the structure, you can't use (today's) FOR-EACH.

If you want to regularize it a bit, you could use something like BLOCK! to mark the bad ones:

      -{email@example.com}-
      -{firstname.lastname@example.com}-
     [-{email@example@example.com}-]
      -{email@subdomain.example.com}-

If that didn't stand out enough, you could use a double-block:

      -{email@example.com}-
      -{firstname.lastname@example.com}-
    [[-{email@example@example.com}-]]
      -{email@subdomain.example.com}-

It's worth remembering such things are options in some cases, but I don't think that works very well here.

`<bad>` Seems Good, But Could It Be Easier?

It would be nice if there were some way to type the FOR-EACH variables, and denote their optionality.

@hiiamboris has done some things in this vein, see his type filter on FOR-EACH proposal. I feel like the concept of skipping and checking should be separate intents. I can want to type check something but not want to skip it.

A leading colon could imply optionality, as it does with refinements now:

for-each [:bad [tag!] text [text!]] [
        -{email@example.com}-
        -{firstname.lastname@example.com}-
  <bad> -{email@example@example.com}-
        -{email@subdomain.example.com}-
        ...
][
    ...
]

You can even use <bad> itself for the type check by quoting it, and enforce/document that more stringently. Also, for the sake of "how does that look in the generator model" I'll write it out that way:

for [:bad ['<bad>] text [text!]] each [
        -{email@example.com}-
        -{firstname.lastname@example.com}-
  <bad> -{email@example@example.com}-
        -{email@subdomain.example.com}-
        ...
][
    ...
]

Of course you can split this out to a table vs. having the tests inline like that.

Interpreting blocks as type checks seems pretty useful, but that takes away from some other applications which might be used for destructuring.

Anyway, this is just some thinking inspired by a very small example.

Much Ado About A Tiny Email Test "Micro-Dialect"

Enter Dashed Strings

<bad> Seems Good, But Could It Be Easier?

`<bad>` Seems Good, But Could It Be Easier?