EuCrypt Chapter 3: Miller-Rabin Implementation



December 28th, 2017 by Diana Coman

~ This is part 4 of the EuCrypt series. Start with Introducing Eucrypt. ~

Primality testing1 is a key part of any implementation of RSA2 and therefore a key part of EuCrypt as well. At first glance, there is a wide choice of primality tests that one can use, from naive direct divisions in search of factors to prime number generators and statistical primality tests. However, many of those are not currently very practical for EuCrypt as they simply take too long to run on the sort of very large numbers that RSA has to use to avoid the simplest brute-force attacks. As a result, the choice narrows down considerably to include mainly probabilistic primality tests: Fermat, Solovay-Strassen and Miller-Rabin. Of those, Miller-Rabin is simply best currently: it has the lowest error probability upper bound among the three and it is at most as expensive computationally as the others. Arguably a deterministic, polynomial-time algorithm such as AKS3 would be even better than Miller-Rabin, but unfortunately AKS is currently still too slow for EuCrypt's needs. Consequently, the chosen algorithm for primality testing in EuCrypt is Miller-Rabin mainly because of a lack of a working better alternative.

If you think that "everyone uses Miller-Rabin anyway", think again. While GnuPG and mostly everyone else using RSA is indeed likely to claim that they are also using Miller-Rabin for primality testing, you'd be well advised to check such claims very, very closely because claims are just cheap labels that stick to anything just the same. If you actually take the time to check those claims and then necessarily find yourself peeling down label after label trying to get to the actual thing that is there as opposed to what the labels claim it is, you might find that every new label further dilutes the original meaning. That's how Koch in his GnuPG sticks the "random" label on pseudo-random at best, that's how 4096 bit randomly-generated keys contain actually at most 4090 pseudo-randomly generated bits and so on until you might even find as I did last time that bits and parts of the implementation do nothing in fact. Don't take my word for it either: go and check for yourself, it's a very healthy habit that might save your very skin some day.

Despite being called a "primality test", Miller-Rabin (like all the other probabilistic primality tests) is more of a compositeness test: the algorithm can prove that a number is composite, but it can not actually prove that a number is indeed prime. Essentially, the given number is suspected of the crime of being composite (as opposed to the desired prime) and witnesses for its compositeness are sought. If one single witness for compositeness is found, then the given number is indeed composite. However, if no witness is found, Miller-Rabin can only reach a relatively weaker conclusion, namely that the given number is likely to be prime. How likely? That depends to a significant degree on the choice of candidate witnesses: how many candidate witnesses the algorithm was asked to investigate and how it actually chose them.

In its search for witnesses, Miller-Rabin relies on the important fact that most numbers between 1 and n-1 are reliable witnesses for n, if n is indeed a composite number. More precisely, at most 1/4 of those numbers are strong liars for n, meaning that at most 1/4 of them will fail to reveal the compositeness of n, when investigated by Miller-Rabin. As a result, the more witnesses investigated, the lower the chances of a composite number to pass for a prime one. Assuming that witnesses are indeed chosen randomly4, the algorithm's error probability is at most (1/4)^t, where t is the number of witnesses investigated. Obviously, each additional witness adds to the cost of running the algorithm and for this reason EuCrypt exposes this as a knob5 for you, the user, to set depending on your own needs. Use it and use it wisely!

The updated eucrypt/smg_rsa/include/knobs.h:

#ifndef SMG_RSA_KNOBS_H
#define SMG_RSA_KNOBS_H

#define ENTROPY_SOURCE "/dev/ttyUSB0"

/*
 * This is the number of witnesses checked by the Miller-Rabin (MR) algorithm for each candidate prime number.
 * The value of M_R_ITERATIONS directly affects the outer bound of MR which is calculated as 4^(-M_R_ITERATIONS)
 * S.MG's choice of 16 here means an outer bound of 4^(-16) = 0.0000000002,
    which is currently considered sufficient for Eulora's needs.
    If your needs are different, change this knob accordingly.
 * NB: if you use this to make keys for some serious use, an outer bound of 1e-10 is really not nearly good enough
    and therefore you'll probably want to *increase* the value of this knob.
 */
#define M_R_ITERATIONS 16

#endif /*SMG_RSA_KNOBS_H*/

The default EuCrypt value for M_R_ITERATIONS is 16 and that means 16 randomly chosen candidate witnesses that are checked. By contrast, GnuPG 1.4.10 at first glance appears to check 5 candidate witnesses (as per cipher/primegen.c call is_prime(ptest, 5, &count2)) and at a deeper investigation it turns out that it checks 1 fixed witness (magic number 2, because why not) and 4 pseudo-randomly chosen ones at best. The label that was 5 but acted more like 4 and the parameter that didn't quite stand for what you'd expect, isn't that precisely the sort of thing you want in your cryptographic tool? No? Then stop using code from the swamps, start using signed code and in any case always read the darned code before you use it because otherwise that's exactly what you will get, each and every time: something other than what it seems, something continuously and rather invisibly to you drifting further away from what you need.

Leaving aside GnuPG for now, let's dive straight in and implement Miller-Rabin using the MPI functions as if they actually worked well6. The algorithm is quite straight forward and the code aims to be as short and clear as possible, with comments to help you follow along. The function is called is_composite, to reflect the fact that Miller-Rabin really checks for compositeness, regardless of the fact that we might prefer it to be otherwise. The n parameter is the actual large number (hence, stored as an MPI) that is suspected of being composite. The nwitnesses parameter is the number of randomly chosen witnesses to check (this is called "security parameter" in some reference books, most notably in Handbook of Applied Cryptography by Menezes, van Oorschot and Vanstone, 1997). You can also think of this nwitnesses as "number of iterations" because each iteration is effectively the check of one candidate witness. Finally, the third parameter, entropy_source, is the handler of an already opened and properly configured source of true random bits (see Chapter 2 for how this is set up in EuCrypt). First, the added function signature in eucrypt/smg_rsa/include/smg_rsa.h:

/*********primegen.c*********/

/*
 * This is an implementation of the Miller-Rabin probabilistic primality test:
 *   checking the specified number of randomly-chosen candidate witnesses
 *    (i.e. with an outer bound of (1/4)^nwitnesses).
 * NB: a 1 result from this test means that the given n is indeed composite (non-prime)
    but a 0 result does not fully guarantee that n is prime!
    If this doesn't make sense to you, read more on probabilistic primality tests.
 * @param n the candidate prime number;
    the function will investigate whether this number is composite or *likely* to be prime.
    How likely? It depends on the number of witnesses checked, see next parameter.
 * @param nwitnesses this is the number of randomly chosen candidate witnesses to the compositeness of n
      that will be checked; the outer bound of the algorithm depends on this.
 * @param entropy_source the source of entropy (ready to read from) that will be used
    to choose candidate witnesses to the compositeness of n.
* @return 1 if at least one witness to the compositeness of n has been found
      (i.e. n is certainly composite);
      0 if no witness to the compositeness of n was found (i.e. it is likely that n is prime)
 * NB: the probability that n is *not* prime although this function returned 0 is
      less than (1/4)^nwitnesses, but it is NOT zero.
 */
int is_composite( MPI n, int nwitnesses, int entropy_source);

And the corresponding implementation, in a new file eucrypt/smg_rsa/primegen.c :

/* primegen.c - prime number generation and checks
 *
 * S.MG, 2017
 *
 */

#include
#include
#include

#include "smg_rsa.h"

/****************
 * is_composite
 * Returns 1 if it finds evidence that n is composite and 0 otherwise.
 * NB: this is a probabilistic test and its strength is directly linked to:
 *  - the number of witnesses AND
 *  - the quality of the entropy source given as arguments!
 */

int is_composite( MPI n, int nwitnesses, int entropy_source) {
  int evidence = 0;
  int nlimbs = mpi_get_nlimbs(n);       /* see MPI representation   */
  unsigned int nbits = mpi_get_nbits(n);        /* used bits        */
  unsigned int noctets = (nbits + 7) / 8; /* source works on octets */
  MPI nminus1 = mpi_alloc(nlimbs);      /* n-1 value is used a LOT  */
  MPI mpi2 = mpi_alloc_set_ui(2);         /* 2 as MPI               */
  MPI a = mpi_alloc(nlimbs);      /* candidate witness              */
  MPI y = mpi_alloc(nlimbs);      /* intermediate values            */
  MPI r = mpi_alloc(nlimbs);      /* n = 1 + 2^s * r                */
  int s;                          /* n = 1 + 2^s * r                */
  int j;                          /* counter for loops              */
  int nread;              /* number of random octets actually read  */

  /* precondition: n > 3 */
  assert( nbits > 2 );

  /* nminus1 = n - 1 as MPI                                         */
  mpi_sub_ui( nminus1, n, 1);

  /*
   * find r odd and s so that n = 1 + 2^s * r
   * n-1 = 2^s * r
   * s is given by the number of trailing zeros of binary n-1
   * r is then obtained as (n-1) / (2^s)
   */
  s = mpi_trailing_zeros( nminus1 );
  mpi_tdiv_q_2exp(r, nminus1, s);

  /*
   * Investigate randomly chosen candidate witnesses.
   * Stop when either:
      * the specified number of witnesses (nwitnesses) have
        been investigated OR
      * a witness for compositeness of n was found
   */
  while (nwitnesses > 0 && evidence == 0) {
    unsigned char *p = xmalloc(noctets);
    do {
      nread = get_random_octets_from(noctets, p, entropy_source);
    } while (nread != noctets);

    mpi_set_buffer(a, p, noctets, 0);
    /* ensure that a < n-1 by making a maximum nbits-1 long:
        * clear all bits above nbits-2 in a
        * keep value of bit nbits-2 in a as it was
    */
    if (mpi_test_bit(a, nbits - 2))
      mpi_set_highbit(a, nbits - 2);
    else
      mpi_clear_highbit(a, nbits - 2);

    /* ensure that 1 < a < n-1; if not, try another random number
     * NB: true random means a CAN end up as 0 or 1 here.
     */

    if (mpi_cmp(a, nminus1) < 0 && mpi_cmp_ui(a, 1) > 0) {
      /* calculate y = a^r mod n */
      mpi_powm(y, a, r, n);
      if (mpi_cmp_ui(y, 1) && mpi_cmp(y, nminus1)) {
        j = 1;
        while ( (j < s) && mpi_cmp(y, nminus1) && (evidence == 0) ) {
          /* calculate y = y^2 mod n */
          mpi_powm(y, y, mpi2, n);
          if (mpi_cmp_ui(y, 1) == 0)
            evidence = 1;
          j = j + 1;
        } /* end while */
        if (mpi_cmp(y, nminus1))
          evidence = 1;
      } /* end if y != 1 and y != n-1 */
      nwitnesses = nwitnesses - 1;
    } /* end if 1 < a < n-1 */
    xfree(p);
  } /* end while for investigating candidate witnesses */

  mpi_free( nminus1 );
  mpi_free( mpi2 );
  mpi_free( a );
  mpi_free( y );
  mpi_free( r );

  return evidence;
}

The variable evidence is initially 0 as Miller-Rabin does not yet have any evidence about n being composite. When and if evidence of compositeness is found for n, this variable will get updated to 1. If the whole algorithm finishes without updating this variable, it means that n is probably prime, with a maximum probability error of (1/4)^nwitnesses, as previously discussed. In any case, this variable holds the result of the Miller-Rabin test at any moment and its value is the one returned when the test finishes.

The nlimbs and nbits are basically measures of how long the MPI n actually is and they are initialised with values returned by the corresponding MPI functions. The nbits value is then converted to number of octets (in noctets) for the very simple reason that the source of randomness in EuCrypt anyways reads at the moment full octets rather than individual bits. This is of course a matter of choice as you could change the setting of the source to read bit by bit, but I can't quite see at the moment any significant advantage to that.

Having obtained this basic length-information on n, the function then goes on to declare and allocate memory for a set of MPI variables that it will need for the Miller-Rabing algorithm itself. Note that there is *no* use of the so-called "secure" memory thing from MPI for the unfortunate reason that the existing implementation of secure is very much an empty label: all it does is to set a flag so that theoretically the memory is not swapped, but there is no guarantee to either that or to the more useful fact that nobody else can read that memory. So given that there is in fact no secure memory implementation no matter how much it would be useful if there was one, EuCrypt takes instead the honest and practical approach of making it clear that it uses plain memory and nothing else. No label if no actual matching object to stick it on, as simple as that.

Once the needed variables are declared and initialised when appropriate, a precondition is checked: assert( nbits > 2); What's this? It's a sanity check essentially because Miller-Rabin is meant for checking large integers, not tiny ones. Moreover, due to the insanity of the underlying MPI which considers in its infinite stupidity that 0 for instance is represented on 0 bits, tiny values of less than 4 (hence, represented on less than 3 bits) will... block the whole thing. Let me point out for now just the very simple fact that the algorithm uses nbits-1 and nbits-2 meaning that nbits should better be at least 3 or otherwise it ends up trying to work with numbers represented on 0 bits and other such nonsense. So instead of risking working with nonsense, EuCrypt uses this assert call to abort the whole thing rather than propagate the nonsense even one instruction further.

Oh, if you wonder by any chance whether GnuPG bothers to even consider such corner cases, that's good for you. I'm sure it's all right if they don't because such cases "should never happen" and "nobody calls Miller-Rabin on small numbers" and all those wonderful castles of trust built on nothing but air. Come to think about it, I even enjoy blowing up such air-supported castles and what not, they make a most-satisfying poufffff! Do you enjoy living in them? POUFFFFF!

The rest of the function closely follows the Miller-Rabin algorithm and the comments in the code hopefully make it easier to understand what each line does even when using the MPI calls. Note that candidate witnesses are chosen indeed randomly by using the specified source and making sure that the call truly returned the requested number of random octets. However, the actual number of random bits in any candidate random number will be by necessity nbits-1 because of the constraint that the random candidate should be less than n-1. This constraint is enforced by simply clearing any bits above nbits-2 (bit numbering starts at 0 so last bit is n-1 rather than n) but keeping at the same time the value of bit on position nbits-2. If that was 1 then mpi_set_highbit(a, nbits-2) is called. If that was 0 then mpi_clear_highbit(a, nbits-2) is called instead.

Note that those 2 mpi functions (mpi_set_highbit and mpi_clear_highbit) are supposedly similar in that they clear all bits above the position indicated and otherwise set to 1 or, respectively, to 0 the bit on the given position. However, the actual code reveals that they are not entirely similar: mpi_set_highbit allocates more memory if the position given is above the current size of the mpi; mpi_clear_highbit doesn't allocate memory in this case. This means effectively that mpi_set_highbit returns an mpi of specified length, while mpi_clear_highbit returns always an mpi of length smaller than the specified bit position. At first glance this might seem to make some sense but the reality is worse than that: mpi_clear_highbit sometimes trims the leading 0 bits of a number and sometimes... doesn't! Possibly for this reason of rather dubious behaviour, GnuPG's Miller-Rabin avoids using mpi_clear_highbit entirely and dances around instead with double calls to mpi_set_highbit instead, on both branches of the if. Since I'm doing coding here rather than voodooing or naked-dances-in-the-code, I fixed instead mpi_clear_highbit to at least reliably trim leading 0s at all times and I'm using it where it is needed. The memory allocation issue is not relevant for my code here anyway because there is already enough memory allocated for the MPI at the beginning of this function.

A core aspect of any Miller-Rabin implementation is the way in which candidate witnesses are chosen. EuCrypt chooses them entirely at random in the interval [2, n-2]. Different options were considered, most notably that of choosing a smaller, potentially more relevant interval, for instance by increasing the lower bound of this interval to at least 256 (2^8). Moreover, according to Eric Bach's paper7, an appropriate upper bound for witnesses would be 2*log(n)*log(n). However, Bach's result relies on the Extended Riemann Hypothesis (ERH) which hasn't been proved so far. So although those bounds are very appealing, EuCrypt sticks for now with using randomly chosen witnesses over the whole interval.

To sum it all up, the main changes brought by today's vpatch are the following:

  1. Addition of primegen.c as part of smg_rsa. This includes the actual implementation of the Miller-Rabin algorithm with the choices discussed above. It uses the MPI implementation introduced in Chapter 1 and corrected along the way.
  2. Changes to MPI, namely further identification and, to the extent needed for current needs of EuCrypt, the killing of additional cockroaches that were identified in the MPI code as a result of investigating the functions that are needed for Miller-Rabin. Most notably: fixing mpi_clear_highbit so that it always trims leading 0 if any, as opposed to current functionality where it sometimes trims them and sometimes not; identifying the fact that MPI currently considers that 0 is represented on 0 bits. Note that this last issue is flagged up and made obvious through updated tests but it is not changed mainly because following all its potential implications through MPI at this stage would eat up so much time as to make it cheaper to just implement from scratch something solid at least. So for the time being at least, the decision made is to honestly admit and clearly highlight this existing fault of MPI.
  3. New tests for MPI highlighting the new issues uncovered.
  4. New tests for smg_rsa focusing on Miller-Rabin: the testing program allows the user to specify the number of runs for each data point and reports a test as failed if at least one run returned a result different from the one expected; test data for Miller-Rabin is chosen to include a few mersenne primes, carmichael composites and a phuctor-found non-prime public exponent of someone's RSA key. Feel free to add to them anything you consider relevant and then run those tests!
  5. A small change to the function that fetches random octets from the FG: the errno is now set to 0 prior to every call that reads from the USB port and its value is then checked in addition to the return value of the read function. This change is needed in order to avoid the unfortunate case when no bits are read but there is apparently no underlying error. Previous version of my function would still abort in such case but current version would instead keep trying as this is a more useful approach for EuCrypt.

The updated get_random_octets_from function in eucrypt/smg_rsa/truerandom.c:

int get_random_octets_from(int noctets, unsigned char *out, int from) {

  int nread;
  int total = 0;

  while (total < noctets) { errno = 0; nread = read(from, out+total, noctets-total); //on interrupt received just try again if (nread == -1 && errno == EINTR) continue; //on error condition abort if (errno != 0 && (nread == -1 || nread == 0)) { printf("Error reading from entropy source %s after %d read: %sn", ENTROPY_SOURCE, total, strerror(errno)); return total; //total read so far } if (nread > 0)
      total = total + nread;
  }
  return total; //return number of octets read
}

The new test_is_composite function in eucrypt/smg_rsa/tests/tests.c:

void test_is_composite(int nruns, char *hex_number, int expected) {
  int i;
  int output;
  int count_ok = 0;
  int source = open_entropy_source(ENTROPY_SOURCE);
  MPI p = mpi_alloc(0);

  mpi_fromstr(p, hex_number);
  printf("TEST is_composite on MPI(hex) ");
  mpi_print(stdout, p, 1);
  for (i=0; i < nruns; i++) {
    printf(".");
    fflush(stdout);
    output = is_composite(p, M_R_ITERATIONS, source);
    if (output == expected)
      count_ok = count_ok + 1;
  }
  printf("done, with %d out of %d correct runs for expected=%d: %sn", count_ok, nruns, expected, count_ok==nruns? "PASS"$
  mpi_free(p);
  close(source);
}

The updated main in eucrypt/smg_rsa/tests/tests.c:

int main(int ac, char **av)
{
  int nruns;
  int id;

  if (ac<2) {
    printf("Usage: %s number_of_runs [testID]n", av[0]);
    return -1;
  }
  nruns = atoi(av[1]);

  if (ac < 3) id = 0; else id = atoi(av[2]); if (id == 0 || id == 1) { printf("Timing entropy source...n"); time_entropy_source(nruns,4096); } if (id == 0 || id == 2) { /* a few primes (decimal): 65537, 116447, 411949103, 20943302231 */ test_is_composite(nruns, "0x10001", 0); test_is_composite(nruns, "0x1C6DF", 0); test_is_composite(nruns, "0x188DD82F", 0); test_is_composite(nruns, "0x4E0516E57", 0); /* a few mersenne primes (decimal): 2^13 - 1 = 8191, 2^17 - 1 = 131071, 2^31 - 1 = 2147483647 */ test_is_composite(nruns, "0x1FFF", 0); test_is_composite(nruns, "0x1FFFF", 0); test_is_composite(nruns, "0x7FFFFFFF", 0); /* a few carmichael numbers, in decimal: 561, 60977817398996785 */ test_is_composite(nruns, "0x231", 1); test_is_composite(nruns, "0xD8A300793EEF31", 1); /* an even number */ test_is_composite(nruns, "0x15A9E672864B1E", 1); /* a phuctor-found non-prime public exponent: 170141183460469231731687303715884105731 */ test_is_composite(nruns, "0x80000000000000000000000000000003", 1); } if (id > 2)
    printf("Current test ids: 0 for all, 1 for entropy source test only, 2 for is_composite test only.");

  return 0;
}

Finally, the .vpatch itself and my signature (you'll need all previous .vpatches of EuCrypt to press this one):

In the next chapter we take one step further towards having RSA, so stay tuned, drink only the good stuff and make sure you won't go... POUFFF!


  1. Primality testing means answering the question: is this number prime or not? 

  2. If this is not clear to you, it's best to just review RSA itself. In a nutshell: the whole working of RSA as cryptographic tool relies on properties of prime numbers; both secret and private RSA keys are essentially made out of prime numbers. Don't eat only this nutshell though - better read the original paper by Rivest, Shamir and Adleman: A Method for Obtaining Digital Signatures and Public-Key Cryptosystems

  3. Agrawal, Kayal and Saxena, Primes is in P, Ann. of Math, Vol. 2, pp. 781-793. 

  4. Which is *not* the case in GnuPG where first witness is actually...fixed and the rest are anyway chosen pseudo-randomly, go and read that code. By contrast, EuCrypt actually chooses them randomly, using a true random number generator, the FG. 

  5. Like all other knobs, this can be found in include/knobs.h 

  6. No, they don't, surprise, surprise. So I'll fix what I use and otherwise at least highlight what other problems I find, as I find them, such is this wonderful world we live in. 

  7. Bach, Eric, 1990. Explicit Bounds for Primality Testing and Related Problems. Mathematics of Computation, 55, pp. 355-380. 

Comments feed: RSS 2.0

One Response to “EuCrypt Chapter 3: Miller-Rabin Implementation”

  1. […] many n for which the smallest witness is greater than ln(n)^(1/(3*ln(ln(ln(n))))). And finally, Miller-Rabin, as discussed previously has very good chances of finding a witness if it really looks …, seeing how three quarters of the numbers between 1 and n-1 are reliable witnesses anyway. So if […]

Leave a Reply