How Does Lisp Deal With Types?

hostilefork · December 17, 2020, 3:31am

So I thought to look at how Lisp handles types. (You'd think I'd have looked earlier.)

Overview: Common Lisp Cookbook: Types
Lengthier Survey: "Typed Lisp, A Primer"

Function Argument Type Checking (not there)

First thing to note, they don't have function argument type checking.

"function arguments in Lisp do not have declared data types, as they do in other programming languages. It is therefore up to the individual function to test whether each actual argument belongs to a type that the function can use."

While Rebol does have type checking, it's extremely limited. It's a frustrating design point...because it's so limited, and because it is poked into a strange compacted place. When people ask "why only 64 data types?" that gets at some of the strangeness of this feature.

When you try to build reflective abilities for the built-in type checking feature, you get into questions of how to expose these 64 bits. That's done via TYPESET!, which isn't very easy to process, and doesn't seem like the foundation of a broader answer. See "the TYPESET! representation problem"

Lisp Type Representation

They seem basically comfortable with the idea of using "WORD!" to name types, e.g. ensure 'integer 1. (Although their version of ENSURE is called THE, as a type annotation. the 'integer 1 That's shorter...same length as "non", and would free ENSURE for other uses. I can't tell if it makes total sense or not.)

"Types are not objects in Common Lisp. There is no object that corresponds to the type integer , for example. What we get from a function like type-of , and give as an argument to a function like typep , is not a type, but a type specifier. A type specifier is the name of a type."

They don't use the equality operator to check the result of "TYPE OF" against anything, because they get back additional "stuff, and I gather that "stuff" can vary from one implementation to another:

* (type-of 1234)
(INTEGER 0 4611686018427387903)

So it seems they lean toward using functions to ask if types match. They call these "predicates" and end them in P. If we were to do a similar thing, it might look like:

>> type? 10 'integer
== #[true]  ; knew to disregard stuff that `'integer = type of 1` wouldn't

Their notion of typeclasses is somewhat concretized. Their ANY-VALUE! is just t. So they do not try and express the notion of "anything" as an explicit union of all known types, the way that Rebol has. It's inherently not extensible if you say any-value!: make typeset! [integer! text! ...] and then add a new datatype later.

If you scroll down a bit you get to some examples:

;; The universal type “t”, has everything as its value.
(typep 'x 't) ;; ⇒ true
(typep 12 't) ;; ⇒ true

;; The empty type: nil
(typep 'x 'nil) ;; ⇒ false; nil has no values.

;; The type “null” contains the one value “nil”.
(typep nil 'null) ;; ⇒ true
(typep () 'null)  ;; ⇒ true

;; “(eql x)” is the singelton type consisting of only x.
(typep 3 '(eql 3)) ;; ⇒ true
(typep 4 '(eql 3)) ;; ⇒ false

;; “(member x₀ … xₙ)” denotes the enumerated type consisting of only the xᵢ.
(typep 3 '(member 3 x "c"))  ;; ⇒ true
(typep 'x '(member 3 x "c")) ;; ⇒ true
(typep 'y '(member 3 x "c")) ;; ⇒ false

;; “(satisfies p)” is the type of values that satisfy predicate p.
(typep 12 '(satisfies (lambda (x) (oddp x)))) ;; ⇒ false
(typep 12 '(satisfies evenp) )                ;; ⇒ true

;; Computation rule for comprehension types.
;; (typep x '(satisfies p)) ≈ (if (p x) t nil)

Why Isn't Everything Just a Function?

It seems like SATISFIES is the sort of generic way to hook up any function as a type check. Like you could do arbitrary calculation to check to see if a number was prime...or something like that.

So why isn't this the only kind of type check? Maybe something to do with the limits of what can be checked at compile time.

It seems that SBCL requires you to go through a symbol to use SATISFIES. Whether that's on purpose or not, that it forces you to name your predicate is helpful as documentation.

Stipulating Variables Match A Type

There is a feature of saying that a word is only assigned particular types (declaim)...and though it didn't work for me in "clisp" it did in "sbcl".

Were we to have such a feature, we'd presumably give it a better name, but:

>> declaim 'x text!

>> x: 10
** Error: you said that x had to be TEXT!

When you think about languages arising like TypeScript, being able to annotate a variable to say that it has a certain type is a popular feature. It's not clear how to do this inexpensively with complex type checks.

Does This Point To Any Answers?

The main thing to notice here is that the process isn't driven by "typesets", but rather by a kind of "type matching dialect"...that covers a few kinds of combining operations (including AND, OR), and if you can't get exactly what you want from the built-in bits, you can supply an open-ended function.

It goes in the direction I've been thinking about, where the core function application code doesn't really know what type checking is. So it's layers above (like FUNC/FUNCTION) that provide the type checking, and define the type checking language.

Nothing particularly new, just thought I'd look.

iArnold · December 18, 2020, 7:15am

Correct. In particular because one of the selling points of the language is that it comes with all kinds of types out of the box. Types you probably would have had to create yourself anyway.

And also observing the less obvious like get- set- sym- -path -word this counts up pretty quickly leaving little to no space for those other types we want to (be) put in.

I see in https://github.com/metaeducation/ren-c/blame/master/src/boot/types.r

; This table of fundamental types is intended to be limited (less than

; 64 entries). Yet there can be an arbitrary number of extension types.

; Those types use the REB_CUSTOM cell class, and give up their ->extra field

; of their cell instances to point to their specific datatype.

and the real limitation is in make-boot.r

/*
 * Current hard limit, higher types used for QUOTED!.  In code which
 * is using the 64 split to implement the literal trick, use REB_64
 * instead of just 64 to make places dependent on that trick findable.
 *
 * !!! If one were desperate for "special" types, things like 64/128/192
 * could be used, as there is no such thing as a "literal END", etc.
 */
#define REB_64 64

/*
 * While the VAL_TYPE() is a full byte, only 64 states can fit in the
 * payload of a TYPESET! at the moment.  Some rethinking would be
 * necessary if this number exceeds 64 (note some values beyond the
 * real DATATYPE! values set special signal bits in parameter typesets.)
 */
STATIC_ASSERT(REB_MAX_PLUS_MAX <= REB_64);

Correct?

okram · December 28, 2020, 2:32pm

How much type information is really needed once the code executes?
I think the papers on types as macros:

should be interesting.

hostilefork · December 28, 2020, 8:26pm

Hello @okram, welcome.

If you have interest and knowledge in type systems in "Lisp-inspired" languages, that would be valuable if you care to offer specific feedback...

...but we're a pretty long way from dependent types. :-/ Also...the language is interpreted, not compiled. We're not particularly focused on performance, more expressivity. (Catchphrase: "Minecraft of Programming")

While there are a bunch of interesting and nuanced details in the language to talk about, the "type system" barely exists. There's around 64 basic value types (word, set-word:, pa/th, {string}, #token, 304, $10.20, [bl 0 ck], (gr 0 up)...), plus arbitrary quoting levels ('''['a ''2 ''(a 'b ''c)]), and a NULL non-value.

Historical Rebol takes for granted that you can reflect the type of things. We now do that via REFLECT:

>> reflect [a b c] 'type
== block  ; (the representation of this answer is what's up for debate)

A version of REFLECT which quotes its left hand side is available as OF, which is neat:

>> type of [a b c]
== block

But the question we're stuck with is that answer. It has been assumed that whatever you get back, it would be suitable for plain equality comparison with...checked against the answer you got for something else. It doesn't seem that Lisp goes with that idea of a kind of "canon answer", and pushes you to use a function to check for type matching.

In terms of our type representation...we're reluctant to re-use an existing type (such as WORD, which is what the symbol block would be). The reason is that we'd like "datatype's type" to be able to stand on its own, such as when matching things against rules:

 >> parse [4 5 6] [some @integer] then [print "Matched some integers!"]
 Matched some integers!

We'd have to invent special keywords or signals otherwise:

 parse [4 5 6] [some typep 'integer] then [print "Not as nice"]

Using the @symbol category for types wouldn't mean it couldn't be used for other purposes, but it would make many dialects want to save it for when types were relevant.

But there are other questions, like is the type of all quoted things @quoted?

>> type of first ['''a]
== @quoted

>> type of first [''1]
== @quoted

Or do the types themselves come back as quoted?

>> type of first ['''a]
== '''@word

>> type of first [''1]
== ''@integer

Or...since the evaluator drops quoting levels, is it better to insulate the quotes by putting them inside an inert container like a block:

>> type of first ['''a]
== @['''word]

>> type of first [''1]
== @[''integer]

There are much deeper questions to get at with user-defined types, and how they might interoperate with generics...all of which are unknowns.

Given the vast underdevelopment in such areas, it takes something of a commitment to learning why the medium would be interesting! But always open to explaining that.

okram · December 29, 2020, 5:29pm

Sorry, I have no expertise in type systems for "Lisp-inspired" languages.

But from my naïve perspective it seems that detailed types such as "a function taking a function of two integers returning a boolean ..." or "an object providing attributes named 'attr1, 'attr2, and 'attr3 each a function taking an integer to text" do not provide any benefit to the execution inside the interpreter.
If that is so, then why not have such types at an earlier phase, a pre-execution source code transformation that does type checking and inference.

Couldn't reflect [a b c] 'type enter such a type inference? A question to clarify would be the relationship between the existing types and the "richer" type system. Should the richer type system be a conservative extension or should it be independent of the current types and maybe optional?