Planning Ahead for BigNum INTEGER!

I've long been an advocate for the idea that INTEGER! should be an arbitrary precision implementation. It doesn't make sense to me why a high-level language which is targeting future applications would constrain itself to 32-bit, or (in R3-Alpha's case) 64-bit INTEGER! values. I've been trying--for instance--to "futureproof" conversions of INTEGER! to BINARY!, in anticipation of the change.

Now, due to Red's desire to be involved in cryptocurrency, they are adding BIGNUM! as a distinct datatype from INTEGER!. Which is not the way I would do it. INTEGER! already has to detect math overflows in order to give an error when your math is out of range. I would prefer INTEGER! to start out using CPU register-size integers, and instead of erroring on overflows, promote the representation into a bignum...on demand.

The Identity Problem

I suggested that INTEGER! could not hurt for performance for most applications, if it was merely promoted to bignum internal representations on overflow. But there's an issue of identity in bignum operations.

Today this is the behavior for integers:

 foo: func [x [integer!]] [add x 1]
 smallnum: 0
 foo smallnum
 print smallnum  ; prints 0, unaffected

So each integer cell has its own identity (as an ANY-IMMEDIATE!, "immediate value").

ADD or + do not go in and fiddle the bits in a way that is seen by "other instances of that integer", the way that APPEND reaches into the "data node" behind a series and affects all instances.

But with BigNums, making a new identity on each addition can be costly:

bignum: 123456789012345678901234567890
loop 1000000 [
    bignum: bignum + 1

How many bignum structures should that be "malloc()" ing? Naively done, that would be a million or more allocations, because each addition in Rebol is expected to produce a new-and-disconnected integer.

Getting around this by making ADD mutate BigNum--when it doesn't mutate other things--seems inconsistent. It's not a new problem, there were some issues with ADD of VECTOR!. In Red too, see "Adding a number to a vector changes the vector itself"

Proposal: Mutating ADD, non-mutating +

The thought I had would be that INTEGER!s would not be immediate by their nature, but that literals would be LOCK'd by default.

 >> x: 10
 >> add x 20
 ** Error, locked integer

 >> x: make integer! 10
 >> add x 20
 >> print x

Then, + would basically be using ADD with a COPY, and locking the result:

 +: enfix func [a b] [
     lock add copy a b

For non-bignums, this could be optimized internally so that if the input was a register-size integer...and the output was a register-size identity node ever need be created.

So someone can still do an inefficient bignum loop with +, if they really wanted to. But a savvy crypto-person would use ADD so they are modifying the existing bignum in place.

What this does is bolster Rebol's "mutability by default" story about operations like APPEND, bringing it further to ADD. It addresses the issue with VECTOR!, by saying that + will create a new vector while add would mutate the existing one. And it avoids introducing a new BIGNUM! datatype.

Like other changes, this can be skinned with Rebol2/R3-Alpha compatibility to make add 1 2 work in old code that needs it to.

(Note: as a side-benefit, this could remove the need for 64-bit math emulation code on 32-bit platforms. Currently, in an I-believe-to-be-misguided-standardization-effort, Rebol has tried to give those using the interpreter a baseline confidence that INTEGER! has 64-bit representation...even if that's not the native integer type. All that code could go, replaced by bignum code that's needed anyway, if you have any cryptography.)


This is another one of those "things that have to get figured out before Beta/One".

We don't actually need the implementation to be bignum yet...

...moreover I don't think there's time for it. There's just too much to do. (It's looking like a fully insane promise to ship this year as it is...adding and vetting BigNums is one of those things that could take months on its own!)

The only thing we need for Beta/One is the operators to be consistent with how they'd work w.r.t. the numeric mutability a bignum-based system would require.

Remember the don't want to do a loop like:

 bignum: 123456789012345678901234567890
 loop 1000000 [
     bignum: bignum + 1

You'd rather have one bignum that's getting incremented, not a million dynamically allocated bignums that have to be copied on each addition. Conceptually think of it like doing math operators on a big vector of numbers and not wanting to make a copy each time. Except it's a big vector of digits representing a single number.

Today ADD is not mutating:

 foo: func [x [integer!]] [add x 1]
 smallnum: 0
 foo smallnum
 print smallnum  ; prints 0, unaffected

But I'm proposing ADD be the prefix mutating operator on arbitrary-precision integers (and vectors, and anything else you might ADD to). So this could run and print 1, without raising an error.

However, I think the default for this code should be to raise an error on the ADD. It seems to me that this very strongly parallels the inadvertent mutation of source blocks. Except here you'd be mutating the 0 that's referenced by the memory cell between smallnum: and foo (!)

Note that I said "that's referenced by the memory cell" and not "that lives in the memory cell". This is a very good argument for having source integers be LOCK'd...not CONST. Because if there's any chance people wanted to be thinking about someone having a "reference" to that integer down the line and wanting to influence it through that reference, they had to be handed some kind of indirection from the get-go. But if it was permanently locked and fit in a cell, then it will always fit in a cell--no indirection needed.

But once you've gone ahead and made your COPY of that INTEGER! from a cell, you can do all the CONST and MUTABLE on it you like. You paid for the node, might as well use it.

After 8 months since I posted about it, PLUS still seems sound

Having PLUS as the prefix form of +, as a non-modifying variant of ADD, probably handles things. You don't see a tremendous number of ADD and SUBTRACT calls in source anyway, so changing them to PLUS and MINUS doesn't seem like that big a deal.

A naive version of PLUS would look like this:

plus: chain [adapt 'add [value1: copy value1 | :lock]]
+: enfix :plus

That's naive because it's paying the cost we want to avoid for small numbers...namely creating an indirection node for an ephemeral mutable value, that's only going to get locked later. To get the performance we're used to on basic numeric operations on CPU-sized numbers, PLUS has to be smarter than this:

plus: chain [
    adapt 'add [
         if integer? :value [return plus-integer-optimized value1 value2]
         value: copy value1
    ] | :lock

So something more like that, but in native code. It should only promote cell-sized integers to memory indirections via COPY if it absolutely has to.

(Again: today's memory indirections would not be BigNums and all the issues they bring into this, but just 64-bit get the interface right so bignums would have the right foundation to work.)

This needs review and doing if it's going to be done.

What would be very helpful to me would be if someone could learn a bit more about Red's BIGNUM! and give me a summary of it, and raise any relevant challenges to my proposal here. I'd like to know about any hard down-the-road issues that are going to come up on integrating bignums into Rebol before I start. We need a table of names for everything and an understanding of which operators on numbers today would modify and which would not.

...afterthought: could this make us build fully standard C89?

One of the "won't build on older C compilers" aspects of R3-Alpha is its use of long long, in order to try and standardize to a 64-bit INTEGER! type. If that standard were removed (by having arbitrary precision on all platforms) then long long could be optional; using whatever maximum legal int there was.

We might envision a world where if all the // comments were merely auto-translated into /* ... */, the codebase would have only one issue keeping it from being fully C89 pedantic... the use of declarations after statements within a code block. Technically speaking, a simple auto-generator could probably take care of that too... just introduce a new scope each time it happens (!)

 void foo() {
     int x = 1;
     x = x + 1;
     int y = 2;
     y = y + 2;

That gives:

test.c:4:10: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
      int y = 2;

But if it were auto-transform'd:

 void foo() {
     int x = 1;
     x = x + 1;
     { int y = 2;
     y = y + 2; }

That would be fine! If you have the interest, @Mark-hi, you might look at @Brett's C lexicals and see if you could twist that to make a PARSE-based preprocessor that can transform the existing sources to build as C89 w.r.t. declaration after statement and comments. Then when bignum gets rid of the long long dependency, it'll be possible.

It's been a year since I've weighed in on the ideas surrounding BigNum arithmetic. But we still need it, and the idea that written-in-C Rebol extensions can always lean on INTEGER! via libRebol to get their fast BigNum math is very appealing. Look at how you have to write m = c^d mod n in the Crypt module's rsa.c:

bigint *RSA_private(const RSA_CTX *c, bigint *bi_msg)
    return bi_crt(c->bi_ctx, bi_msg, c->dP, c->dQ, c->p, c->q, c->qInv);
    BI_CTX *ctx = c->bi_ctx;
    ctx->mod_offset = BIGINT_M_OFFSET;
    return bi_mod_power(ctx, bi_msg, c->d);

If your bigints were just REBVAL* to INTEGER!, you could say:

 rebValue(c, "pow", d, "mod", n)

I picked a small sample as the tip of that iceberg, and if you look around in that file you can see just what a PITA the bi_xxx routines are when it comes to allocation, freeing, etc.

Arbitrary precision the way of the world now, and the BigNums our TLS uses should be the same BigNums the language uses, or we are failing.

The main musing I've been having is that we may not want to tie ourselves to any one BigNum implementation. Some are lighter, some are heavier. We could make BIGNUM! an extension-provided type distinct from INTEGER! (as Red does) and force you into a conscious decision of which you are using... thus allowing you to compile with or without it. But I don't like this line of thinking.

An alternative would be to have an overflow/underflow hook for the native integer math. The default would just you could build a thin Rebol that only worked with platform ints. But if an extension registered a hook, then the overflow would pass control to the extension to re-engineer the cell to point to a dynamic structure of its choice. You could build with one-and-only-one such hook installed (could be enforced by the uniqueness requirements of linking, and the "NoBigInt Extension" could be the extension that just gave the dummy that fails).

(Demoting back to a platform-sized int would be at the discretion of the extension when handing back a result from any math operation.)

Anyway: just wanted to mention this interesting point about switching it around so that any C powered crypto escapes out to use Rebol INTEGER! to back its arithmetic operations, via calls to libRebol.


I recently had a look at erlang where I had a look at the tail recursion for factorial calculation.
Basically it had to do with calculating the value that results in a number that is far too big for a normal INT or even a float, giving about 1750 digits.
The code is as expected: (in fac_func.erl)
another_factorial(0) -> 1;
another_factorial(N) -> N * another_factorial(N-1).
1> c(fac_func).
2> fac_func:another_factorial(200).
3> fac_func:another_factorial(2000).

Is this what you have in mind?

Just for your information.
There is a big difference between needing the result of c^d mod n and obtaining said result by modding c^d by n. Nobody does the latter, for good reason: most of the work in calculating c^d is useless wasted effort. Instead, the calculation is done incrementally, multiplying by c and reducing it to a remainder every time the product exceeds n. So it is always a single operation (function) that takes three arguments, and never an expression with the two operators "power" and "modulo".
Still need the bignum math though, since in typical usage c, d, and n are all many, many digits.

Just for your information.

Another Brian doing a pretty good job of 'xplainin;

For the record I am adding the link that I posted in the chat that describes the working of the bignum in Python.

In short this comes down to using base 2^30

The representation of 123456789101112131415 will look as follows:

ob_size 3
ob_digit 437976919 87719511 107

Algorithm of converting our representation back:


In PHP a module bcmath is optionally available

BCMath Arbitrary Precision Mathematics

This extension is an interface to the GNU implementation 
as a library of the Basic Calculator utility by Philip Nelson; hence the name.

Good to look at. However, the BigNum implementation most relevant to build on is the one that is in mbedTLS:


What's advantageous about this library is that it's written in pure C, maintained for bugs, and is also used in the mbedTLS cryptography.

I've poked a bit into it to look at how it might be integrated...and it's awfully tempting to try and hack it up just a little bit so that it leverages some of the trickery in the guts of Ren-C. However, those temptations are probably misguided in the near term... and it's better to leave the files unmodified and save that optimization for later.

But I definitely think the writing is on the wall that trying to establish some common denominator of 64-bit integers on 32-bit platforms as well as 64-bit ones is basically a fool's errand. You either use integers the size of your pointers...or if that's not enough you go to bignums.

1 Like

cc: @Mark-hi

I was thinking about what these "CPU register-size integers" should be. :-/

R3-Alpha tried to standardize on a platform-supported 64-bit integer, prior to formal C99 specification of the int64_t type. So it figured out based on long long and long and other factors what got you 64-bits on a practical set of compilers. (We still do that when C99 isn't available, but use a portable version of <stdint.h> written by other people...which uses the agreed upon names, and has even more platforms accounted for.)

But even if you're targeting "guaranteed" C99, things aren't as obvious as you might think:

First of all, the platform need not provide a data type that's exactly any particular size...only at least a certain size. If you want to line things up in memory one 32-bit thing after another, you might find that int32_t isn't defined -- yet your compiler is still standard C99. :-/

Moreover, it sounds like some platforms that say they're C99-ish don't offer 64-bit types.

Next we get into the notion of whether we're looking to prolong the range for which a bignum need not be made, or to figure a 33-bit type is big enough that it might as well be a 128-bit type for all we care. This might make us want to use int32_fast_t so that math is the fastest that can provided and still do 32-bit numeric calculations.

But... then there's the fact that INTEGER! has been a surrogate for a proper POINTER! in the FFI. This makes it seem like pointer-sized integers could be the smartest choice. But if you read the fine print, once again C99 is not guaranteed to provide intptr_t as pointer-sized integers!

However, there are lots of intptr_t used internally to the code; this is in order to try to lay out structures at nice alignments that don't skew pointer fields. So it's basically a dependency.

I think intptr_t is probably best

R3-Alpha tried to standardize on a 64-bit value based on the idea that it would keep users from worrying about the differences in their code based on 32-bit vs. 64-bit platforms.

With BigNums as the user-exposed abstraction for INTEGER!, then it's actually a benefit not to standardize the internals to one bit size. Making sure the C code doesn't assume the size of the native non-bignum format helps ensure builds are being tested that can take available of different sizes.

By using an intptr_t type, that has the benefit of knowing your integer can represent a platform pointer without having to instantiate a bignum. Space-wise, it leaves a series slot on the table in the integer cell...32 "wasted" bits on 32-bit platforms and 64 "wasted" bits on 64-bit platforms. That might seem bad, -but- if it stays available then it's big enough for a pointer or other data that could help it in other creative ways or optimizations. Leaving room for that isn't a bad thing...

Been a while since last post about this.

I found an implementation of bignums up to 1024 bits here
Tiny Bignum C
and this looks to me like it is tiny indeed and it is relatively easy to understand how this works.
The GMP MP library was mentioned before and this implementation contains many more functions and also floating points arithmetic and on an on. How this handles growing integers is described on this
GMP MP Internals documentation page. (Erlang also uses the GMP MP).
(And a tutorial on using GMP )

Adding a new dependency for something that could be expected to be built in in the next generation of programming languages is not the way a small project is waiting for.

Maintaining all interfaces/bindings for the new releases of every added library is not what I consider fun in programming.

You mentioned the bignum already in the sourcecode it is now here which also could be used. (Except these are all C functions, they are present but how to call them from Ren-C.)

The integer when overflow occurs should use this, so operators must know when to use the bignum arithmetic versus the standard routines.

How would this work given the way integers are handled today?

I have a bignum branch where I have a basic concept of how to interoperate with it going.

There's some issue with it building to Wasm. (We've never built the crypto extension for Wasm before, so I hadn't run into it.) It's on my agenda.

I'm trying to consider if it would be possible to do a build which has no BigNums and just fails on overflow. I've also wondered about letting people swap in different BigNum implementations. Really it's still kind of up in the air.

Overflow detection is non-trivial. Sometimes there are special functions in the compiler to help you do additions or multiplication with overflow detection...but this varies from one compiler to another and isn't part of the C standard.

Atronix added some wrappers that would pick compiler functions when available:


One disadvantage there is that it makes you say whether you want to use 32-bit or 64-bit math. I was thinking of using intptr_t so that the 32-bit builds would switch to BigNum at the 32-bit boundary and 64-bit builds at the 64-bit boundary, just so we could say the system could compile on architectures that didn't have 64-bit integers (as not all C embedded platforms do).

It's still on my mind how to do it...but certainly I'm still committed that an arbitrary precision INTEGER! is non-negotiable for the language. Just a lot of things to think about...


Just did a


on my system, result 96 hits!

So from the page on Python I posted before, in Python2 the integer type was a regular integer and when an overflow occurred, it was promoted to a bignum type. In Python3 this construction has been dropped and here all integers are of the bignum type.

We will have to think over which of these are best suited for Ren-C.

My view on this is that really most of the time plain integers are good enough for the values that they contain.

Well, bignum support sounds to me like an absolute must.

And given that GMP is already there, with thousands of hours invested in the project, why reinvent the wheel?

In the very initial versions of Arturo, I had 2 distinct types: one for "normal" integers and one for "big" integers. In the end, it proved to be too much fuss.

Given that we are all aiming for a (very) high-level language, having to struggle with 2 different types of things that are still integers seems to me unnecessarily low-level.

So, the way Arturo handles things right now: we just have the :integer type. And that's it.

Whether this :integer is a small integer or a bit integer, or if x + 1 leads to an overflow, and all that, is handled internally, without the user having to bother at all, whether a simple integer has 1 or 1001 digits. :slight_smile:

Source: My father is a mathematician. First thing he always does when trying a language is check the craziest Mersenne he can think of for primality. The test was convincing enough to make me totally eliminate the distinction between integers lol


Python's approach - everything being a bignum - seems like a total overkill to me.

Let me put it this way: what are the chances that you will be constantly using bignums, with all the overhead that it brings?

So, if the most likely scenario is using smaller (32- or 64-bit) integers, why not stick to that as the "default" type and switch internally to GMP big ints when needed?

The question comes down to whether you want to expose the ability to modify a bignum's bits in-place. If you offer that semantic, then the identity of the integer is important...even if it starts out small...because that identity has to be the same after you've done math to it. That necessitates an allocation.

If you don't want to offer that, then you're saying that no matter how big a bignum gets...even adding one to it will have to allocate something the GC has to track. If someone does a loop with x: x + 1 and they cross into bignum range...then oh well. Guess they'll get a lot of allocations.

If you don't like the sound of that (which I do not) then you have to think of ways to offer in-place variants. You can do this by making that the M.O. of a separate BIGNUM! type with its own roster of math functions...where you're expected to consciously manage it just like any other type that requires an allocation. Or you can rethink your INTEGER! so it deals with the concept in some nuanced way.

My schematic of solution is the latter one:

  • Source-level integers are immutable and choose a platform integer representation if possible, otherwise use a copy-on-write allocated bignum.
  • Things like add are mutating of their first argument. So add 1 2 would be illegal.
  • Things like + don't mutate their first argument, and return an immutable result. This allows them to make their returned result a platform integer if it can fit. So 1 + 2 would be legal.
  • copy can make a mutable integer out of an immutable one. So add (copy 1) 2 would be legal.

More details if you read from the top of the thread.

Regarding what you've mentioned (disclaimer: I have absolutely no idea how Rebol handles this), here's what I'm doing:

the vast majority of functions (add included) can take either 2 simple values/parameters (in the case, of add 2 :integer/:floating type variables) or a literal parameter + another one - or more (in this case, an :integer/:floating).

In the first case, it would work as usual. In the second case, it would work as an in-place operation:


; normal addition
a: b + c
a: add b c

; in-place addition
'b + c
add 'b c

So, in a few words, in this last case, the operation is much more sustainable... and allocation-free (even in the case of bignums).

P.S. I'm generally really fond of in-place operations/functions, for speed of course, plus... having a clear distinction (pretty much in a Ruby fashion: see ! functions) makes things easiers. I mean... I absolutely hate it when a function modifies a parameter without me explicitly allowing it to, or vice versa. So, simple rules: you pass a literal? That's in-place. If you pass values, then no modification will take place, and the result will basically imply a new allocation or copy.

I'd have to understand what your notion of identity is, and how that identity can be transferred around.

If I say x: y does add 'x 1 then change y as well? It doesn't seem it would if you didn't a-priori have an allocation of a bignum (or at the very least, force an allocation on that assignment and sync it in both variables).

If it doesn't transfer, then you have the situation where your in-place mechanics are confined to a single variable name in a single scope, and becomes a new allocation when it leaves the scope.

It's good to consider degrees of freedom like this, and how the language parts can be used.

But an infix addition operator that modifies its left argument seems like an unwanted axis. And taking a WORD! to ADD seems to me stranger than just being willing to say that all ADDs mutate.

(If you want to use + as prefix, you would write + -< b c ... see SHOVE)

I will reply with an example.

Hint: ++ stands for append (in Arturo a symbol is nothing but the infix version of a prefix-word)

a) "normal" append with values.
b) "in-place" append
c) "in-place" append after new assignment

And here's some bignum handling (the last example shows an int64 overflow):