Planning Ahead for BigNum INTEGER!

hostilefork · February 4, 2020, 9:10pm

It's been a year since I've weighed in on the ideas surrounding BigNum arithmetic. But we still need it, and the idea that written-in-C Rebol extensions can always lean on INTEGER! via libRebol to get their fast BigNum math is very appealing. Look at how you have to write m = c^d mod n in the Crypt module's rsa.c:

bigint *RSA_private(const RSA_CTX *c, bigint *bi_msg)
{
#ifdef CONFIG_BIGINT_CRT
    return bi_crt(c->bi_ctx, bi_msg, c->dP, c->dQ, c->p, c->q, c->qInv);
#else
    BI_CTX *ctx = c->bi_ctx;
    ctx->mod_offset = BIGINT_M_OFFSET;
    return bi_mod_power(ctx, bi_msg, c->d);
#endif
}

If your bigints were just REBVAL* to INTEGER!, you could say:

 rebValue(c, "pow", d, "mod", n)

I picked a small sample as the tip of that iceberg, and if you look around in that file you can see just what a PITA the bi_xxx routines are when it comes to allocation, freeing, etc.

Arbitrary precision the way of the world now, and the BigNums our TLS uses should be the same BigNums the language uses, or we are failing.

The main musing I've been having is that we may not want to tie ourselves to any one BigNum implementation. Some are lighter, some are heavier. We could make BIGNUM! an extension-provided type distinct from INTEGER! (as Red does) and force you into a conscious decision of which you are using... thus allowing you to compile with or without it. But I don't like this line of thinking.

An alternative would be to have an overflow/underflow hook for the native integer math. The default would just FAIL...so you could build a thin Rebol that only worked with platform ints. But if an extension registered a hook, then the overflow would pass control to the extension to re-engineer the cell to point to a dynamic structure of its choice. You could build with one-and-only-one such hook installed (could be enforced by the uniqueness requirements of linking, and the "NoBigInt Extension" could be the extension that just gave the dummy that fails).

(Demoting back to a platform-sized int would be at the discretion of the extension when handing back a result from any math operation.)

Anyway: just wanted to mention this interesting point about switching it around so that any C powered crypto escapes out to use Rebol INTEGER! to back its arithmetic operations, via calls to libRebol.

iArnold · March 24, 2020, 8:39pm

I recently had a look at erlang where I had a look at the tail recursion for factorial calculation.
Basically it had to do with calculating the value that results in a number that is far too big for a normal INT or even a float, giving about 1750 digits.
The code is as expected: (in fac_func.erl)
-module(fac_func).
-export([another_factorial/1]).
another_factorial(0) -> 1;
another_factorial(N) -> N * another_factorial(N-1).
1> c(fac_func).
{ok,fac_func}
2> fac_func:another_factorial(200).
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
3> fac_func:another_factorial(2000).
331627509245063324117539338057632403828111720810578039457193543706038077905600822400273230859732592255402352941225834109258084817415293796131386633526343688905634058556163940605117252571870647856393544045405243957467037674108722970434684158343752431580877533645127487995436859247408032408946561507233250652797655757179671536718689359056112815871601717232657156110004214012420433842573712700175883547796899921283528996665853405579854903657366350133386550401172012152635488038268152152246920995206031564418565480675946497051552288205234899995726450814065536678969532101467622671332026831552205194494461618239275204026529722631502574752048296064750927394165856283531779574482876314596450373991327334177263608852490093506621610144459709412707821313732563831572302019949914958316470942774473870327985549674298608839376326824152478834387469595829257740574539837501585815468136294217949972399813599481016556563876034227312912250384709872909626622461971076605931550201895135583165357871492290916779049702247094611937607785165110684432255905648736266530377384650390788049524600712549402614566072254136302754913671583406097831074945282217490781347709693241556111339828051358600690594619965257310741177081519922564516778571458056602185654760952377463016679422488444485798349801548032620829890965857381751888619376692828279888453584639896594213952984465291092009103710046149449915828588050761867924946385180879874512891408019340074625920057098729578599643650655895612410231018690556060308783629110505601245908998383410799367902052076858669183477906558544700148692656924631933337612428097420067172846361939249698628468719993450393889367270487127172734561700354867477509102955523953547941107421913301356819541091941462766417542161587625262858089801222443890248677182054959415751991701271767571787495861619665931878855141835782092601482071777331735396034304969082070589958701381980813035590160762908388574561288217698136182483576739218303118414719133986892842344000779246691209766731651433494437473235636572048844478331854941693030124531676232745367879322847473824485092283139952509732505979127031047683601481191102229253372697693823670057565612400290576043852852902937606479533458179666123839605262549107186663869354766108455046198102084050635827676526589492393249519685954171672419329530683673495544004586359838161043059449826627530605423580755894108278880427825951089880635410567917950974017780688782869810219010900148352061688883720250310665922068601483649830532782088263536558043605686781284169217133047141176312175895777122637584753123517230990549829210134687304205898014418063875382664169897704237759406280877253702265426530580862379301422675821187143502918637636340300173251818262076039747369595202642632364145446851113427202150458383851010136941313034856221916631623892632765815355011276307825059969158824533457435437863683173730673296589355199694458236873508830278657700879749889992343555566240682834763784685183844973648873952475103224222110561201295829657191368108693825475764118886879346725191246192151144738836269591643672490071653428228152661247800463922544945170363723627940757784542091048305461656190622174286981602973324046520201992813854882681951007282869701070737500927666487502174775372742351508748246720274170031581122805896178122160747437947510950620938556674581252518376682157712807861499255876132352950422346387878954850885764466136290394127665978044202092281337987115900896264878942413210454925003566670632909441579372986743421470507213588932019580723064781498429522595589012754823971773325722910325760929790733299545056388362640474650245080809469116072632087494143973000704111418595530278827357654819182002449697761111346318195282761590964189790958117338627206088910432945244978535147014112442143055486089639578378347325323595763291438925288393986256273242862775563140463830389168421633113445636309571965978466338551492316196335675355138403425804162919837822266909521770153175338730284610841886554138329171951332117895728541662084823682817932512931237521541926970269703299477643823386483008871530373405666383868294088487730721762268849023084934661194260180272613802108005078215741006054848201347859578102770707780655512772540501674332396066253216415004808772403047611929032210154385353138685538486425570790795341176519571188683739880683895792743749683498142923292196309777090143936843655333359307820181312993455024206044563340578606962471961505603394899523321800434359967256623927196435402872055475012079854331970674797313126813523653744085662263206768837585132782896252333284341812977624697079543436003492343159239674763638912115285406657783646213911247447051255226342701239527018127045491648045932248108858674600952306793175967755581011679940005249806303763141344412269037034987355799916009259248075052485541568266281760815446308305406677412630124441864204108373119093130001154470560277773724378067188899770851056727276781247198832857695844217588895160467868204810010047816462358220838532488134270834079868486632162720208823308727819085378845469131556021728873121907393965209260229101477527080930865364979858554010577450279289814603688431821508637246216967872282169347370599286277112447690920902988320166830170273420259765671709863311216349502171264426827119650264054228231759630874475301847194095524263411498469508073390080000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
4>

Is this what you have in mind?

Mark-hi · March 24, 2020, 11:49pm

Just for your information.
There is a big difference between needing the result of c^d mod n and obtaining said result by modding c^d by n. Nobody does the latter, for good reason: most of the work in calculating c^d is useless wasted effort. Instead, the calculation is done incrementally, multiplying by c and reducing it to a remainder every time the product exceeds n. So it is always a single operation (function) that takes three arguments, and never an expression with the two operators "power" and "modulo".
Still need the bignum math though, since in typical usage c, d, and n are all many, many digits.

hostilefork · March 25, 2020, 10:15am

Just for your information.

Another Brian doing a pretty good job of 'xplainin;

iArnold · September 13, 2020, 8:11am

For the record I am adding the link that I posted in the chat that describes the working of the bignum in Python. https://rushter.com/blog/python-integer-implementation/

In short this comes down to using base 2^30

The representation of 123456789101112131415 will look as follows:
ob_size 3
ob_digit 437976919 87719511 107
Algorithm of converting our representation back:
(437976919∗2^(30∗0))+(87719511∗2^(30∗1))+(107∗2^(30∗2))

In PHP a module bcmath is optionally available

BCMath Arbitrary Precision Mathematics

This extension is an interface to the GNU implementation 
as a library of the Basic Calculator utility by Philip Nelson; hence the name.

hostilefork · September 14, 2020, 11:48pm

Good to look at. However, the BigNum implementation most relevant to build on is the one that is in mbedTLS:

/include/mbedtls/bignum.h
/include/mbedtls/bn_mul.h
/library/bignum.c

What's advantageous about this library is that it's written in pure C, maintained for bugs, and is also used in the mbedTLS cryptography.

I've poked a bit into it to look at how it might be integrated...and it's awfully tempting to try and hack it up just a little bit so that it leverages some of the trickery in the guts of Ren-C. However, those temptations are probably misguided in the near term... and it's better to leave the files unmodified and save that optimization for later.

But I definitely think the writing is on the wall that trying to establish some common denominator of 64-bit integers on 32-bit platforms as well as 64-bit ones is basically a fool's errand. You either use integers the size of your pointers...or if that's not enough you go to bignums.

hostilefork · September 19, 2020, 5:48pm

cc: @Mark-hi

I was thinking about what these "CPU register-size integers" should be. :-/

R3-Alpha tried to standardize on a platform-supported 64-bit integer, prior to formal C99 specification of the int64_t type. So it figured out based on long long and long and other factors what got you 64-bits on a practical set of compilers. (We still do that when C99 isn't available, but use a portable version of <stdint.h> written by other people...which uses the agreed upon names, and has even more platforms accounted for.)

But even if you're targeting "guaranteed" C99, things aren't as obvious as you might think:

https://en.cppreference.com/w/c/types/integer

First of all, the platform need not provide a data type that's exactly any particular size...only at least a certain size. If you want to line things up in memory one 32-bit thing after another, you might find that int32_t isn't defined -- yet your compiler is still standard C99. :-/

Moreover, it sounds like some platforms that say they're C99-ish don't offer 64-bit types.

Next we get into the notion of whether we're looking to prolong the range for which a bignum need not be made, or to figure a 33-bit type is big enough that it might as well be a 128-bit type for all we care. This might make us want to use int32_fast_t so that math is the fastest that can provided and still do 32-bit numeric calculations.

But... then there's the fact that INTEGER! has been a surrogate for a proper POINTER! in the FFI. This makes it seem like pointer-sized integers could be the smartest choice. But if you read the fine print, once again C99 is not guaranteed to provide intptr_t as pointer-sized integers!

However, there are lots of intptr_t used internally to the code; this is in order to try to lay out structures at nice alignments that don't skew pointer fields. So it's basically a dependency.

I think intptr_t is probably best

R3-Alpha tried to standardize on a 64-bit value based on the idea that it would keep users from worrying about the differences in their code based on 32-bit vs. 64-bit platforms.

With BigNums as the user-exposed abstraction for INTEGER!, then it's actually a benefit not to standardize the internals to one bit size. Making sure the C code doesn't assume the size of the native non-bignum format helps ensure builds are being tested that can take available of different sizes.

By using an intptr_t type, that has the benefit of knowing your integer can represent a platform pointer without having to instantiate a bignum. Space-wise, it leaves a series slot on the table in the integer cell...32 "wasted" bits on 32-bit platforms and 64 "wasted" bits on 64-bit platforms. That might seem bad, -but- if it stays available then it's big enough for a pointer or other data that could help it in other creative ways or optimizations. Leaving room for that isn't a bad thing...

iArnold · October 26, 2020, 9:07am

Been a while since last post about this.

I found an implementation of bignums up to 1024 bits here
Tiny Bignum C
and this looks to me like it is tiny indeed and it is relatively easy to understand how this works.
The GMP MP library was mentioned before and this implementation contains many more functions and also floating points arithmetic and on an on. How this handles growing integers is described on this
GMP MP Internals documentation page. (Erlang also uses the GMP MP).
(And a tutorial on using GMP )

Adding a new dependency for something that could be expected to be built in in the next generation of programming languages is not the way a small project is waiting for.

Maintaining all interfaces/bindings for the new releases of every added library is not what I consider fun in programming.

You mentioned the bignum already in the sourcecode it is now here which also could be used. (Except these are all C functions, they are present but how to call them from Ren-C.)

The integer when overflow occurs should use this, so operators must know when to use the bignum arithmetic versus the standard routines.

How would this work given the way integers are handled today?

hostilefork · October 27, 2020, 5:04am

I have a bignum branch where I have a basic concept of how to interoperate with it going.

There's some issue with it building to Wasm. (We've never built the crypto extension for Wasm before, so I hadn't run into it.) It's on my agenda.

I'm trying to consider if it would be possible to do a build which has no BigNums and just fails on overflow. I've also wondered about letting people swap in different BigNum implementations. Really it's still kind of up in the air.

Overflow detection is non-trivial. Sometimes there are special functions in the compiler to help you do additions or multiplication with overflow detection...but this varies from one compiler to another and isn't part of the C standard.

Atronix added some wrappers that would pick compiler functions when available:

src/include/sys-int-funcs.h

One disadvantage there is that it makes you say whether you want to use 32-bit or 64-bit math. I was thinking of using intptr_t so that the 32-bit builds would switch to BigNum at the 32-bit boundary and 64-bit builds at the 64-bit boundary, just so we could say the system could compile on architectures that didn't have 64-bit integers (as not all C embedded platforms do).

It's still on my mind how to do it...but certainly I'm still committed that an arbitrary precision INTEGER! is non-negotiable for the language. Just a lot of things to think about...

iArnold · October 28, 2020, 2:31pm

Just did a

locate libgmp.so

on my system, result 96 hits!

iArnold · November 1, 2020, 1:33pm

So from the page on Python I posted before, in Python2 the integer type was a regular integer and when an overflow occurred, it was promoted to a bignum type. In Python3 this construction has been dropped and here all integers are of the bignum type.

We will have to think over which of these are best suited for Ren-C.

My view on this is that really most of the time plain integers are good enough for the values that they contain.

drkameleon · November 3, 2020, 9:00am

Well, bignum support sounds to me like an absolute must.

And given that GMP is already there, with thousands of hours invested in the project, why reinvent the wheel?

In the very initial versions of Arturo, I had 2 distinct types: one for "normal" integers and one for "big" integers. In the end, it proved to be too much fuss.

Given that we are all aiming for a (very) high-level language, having to struggle with 2 different types of things that are still integers seems to me unnecessarily low-level.

So, the way Arturo handles things right now: we just have the :integer type. And that's it.

Whether this :integer is a small integer or a bit integer, or if x + 1 leads to an overflow, and all that, is handled internally, without the user having to bother at all, whether a simple integer has 1 or 1001 digits.

Source: My father is a mathematician. First thing he always does when trying a language is check the craziest Mersenne he can think of for primality. The test was convincing enough to make me totally eliminate the distinction between integers lol

drkameleon · November 3, 2020, 9:03am

Python's approach - everything being a bignum - seems like a total overkill to me.

Let me put it this way: what are the chances that you will be constantly using bignums, with all the overhead that it brings?

So, if the most likely scenario is using smaller (32- or 64-bit) integers, why not stick to that as the "default" type and switch internally to GMP big ints when needed?

hostilefork · November 3, 2020, 3:39pm

The question comes down to whether you want to expose the ability to modify a bignum's bits in-place. If you offer that semantic, then the identity of the integer is important...even if it starts out small...because that identity has to be the same after you've done math to it. That necessitates an allocation.

If you don't want to offer that, then you're saying that no matter how big a bignum gets...even adding one to it will have to allocate something the GC has to track. If someone does a loop with x: x + 1 and they cross into bignum range...then oh well. Guess they'll get a lot of allocations.

If you don't like the sound of that (which I do not) then you have to think of ways to offer in-place variants. You can do this by making that the M.O. of a separate BIGNUM! type with its own roster of math functions...where you're expected to consciously manage it just like any other type that requires an allocation. Or you can rethink your INTEGER! so it deals with the concept in some nuanced way.

My schematic of solution is the latter one:

Source-level integers are immutable and choose a platform integer representation if possible, otherwise use a copy-on-write allocated bignum.
Things like add are mutating of their first argument. So add 1 2 would be illegal.
Things like + don't mutate their first argument, and return an immutable result. This allows them to make their returned result a platform integer if it can fit. So 1 + 2 would be legal.
copy can make a mutable integer out of an immutable one. So add (copy 1) 2 would be legal.

More details if you read from the top of the thread.

drkameleon · November 3, 2020, 4:32pm

Regarding what you've mentioned (disclaimer: I have absolutely no idea how Rebol handles this), here's what I'm doing:

the vast majority of functions (add included) can take either 2 simple values/parameters (in the case, of add 2 :integer/:floating type variables) or a literal parameter + another one - or more (in this case, an :integer/:floating).

In the first case, it would work as usual. In the second case, it would work as an in-place operation:

e.g.

; normal addition
a: b + c
a: add b c

; in-place addition
'b + c
add 'b c

So, in a few words, in this last case, the operation is much more sustainable... and allocation-free (even in the case of bignums).

P.S. I'm generally really fond of in-place operations/functions, for speed of course, plus... having a clear distinction (pretty much in a Ruby fashion: see ! functions) makes things easiers. I mean... I absolutely hate it when a function modifies a parameter without me explicitly allowing it to, or vice versa. So, simple rules: you pass a literal? That's in-place. If you pass values, then no modification will take place, and the result will basically imply a new allocation or copy.

hostilefork · November 3, 2020, 4:51pm

I'd have to understand what your notion of identity is, and how that identity can be transferred around.

If I say x: y does add 'x 1 then change y as well? It doesn't seem it would if you didn't a-priori have an allocation of a bignum (or at the very least, force an allocation on that assignment and sync it in both variables).

If it doesn't transfer, then you have the situation where your in-place mechanics are confined to a single variable name in a single scope, and becomes a new allocation when it leaves the scope.

It's good to consider degrees of freedom like this, and how the language parts can be used.

But an infix addition operator that modifies its left argument seems like an unwanted axis. And taking a WORD! to ADD seems to me stranger than just being willing to say that all ADDs mutate.

(If you want to use + as prefix, you would write + -< b c ... see SHOVE)

drkameleon · November 3, 2020, 5:03pm

I will reply with an example.

Hint: ++ stands for append (in Arturo a symbol is nothing but the infix version of a prefix-word)

a) "normal" append with values.
b) "in-place" append
c) "in-place" append after new assignment

drkameleon · November 3, 2020, 5:10pm

And here's some bignum handling (the last example shows an int64 overflow):

hostilefork · November 3, 2020, 5:12pm

That's a series, which presumably have an allocation, which lets you do that. How would you do it with a 32-bit or 64-bit number?

So for the example to apply to integers, then it comes back to your question:

You wouldn't wait because the GMP offers you the allocation of an identity for the number.

(Though my intent is using mbedTLS BigNums, for smaller code size and because it's what our encryption already uses.)

iArnold · November 4, 2020, 7:27am

Right,

That is correct, and the way to handle stuff. But there is a dependency issue with this.

Completely agree on this too. As I said, most of the integers one encounters are staying well within the limit of the given bits.

Right. For things as MySQL support, I only need this for my webdevelopment, talking to the MySQL database my provider hosts for me. So I need to install the MySQL (or MariaDB) libs for that case only. But when I want BIGNUM's, I want this available in a general sense. Having a dependency on GMP makes everybody that want to use it have a copy of the gmp library used, and if you want to compile the newest Ren-C you will need the gmp source library as well. So we get yet another dependency on the other project. Everybody in Ren-C world will need one, and to be able to debug and help each other we need the SAME version of the sources used. Then we are also depending on the GMP project not to introduce bugs in the latest versions...

The dependency on the AGG library (or for that matter any other GUI library) for R2 has a similar problem making the open sourcing of Rebol back in the day an extra problem, where R2 closed source could just use the AGG library of choice and ship with that built-in.
I call this for myself the "moving target problem", knowing that this project is also a moving target.
I also now have 96 libgmp.so libs on my system.