Joye M.Cryptographic hardware and embedded systems.2005
.pdf
86 S. Kwon et al.
When |
then the number of nonzero entries of |
is one |
||
because |
|
Therefore we need one AND gate and no XOR gate to |
||
compute |
with |
Since the addition |
|
in (35) |
needs one XOR gate for each |
the total gate complexity of our |
|||
multiplier is |
AND gates plus at most |
|
XOR gates. The critical |
|
path delay can also be evaluated easily. It is clear from (35) and (36) that the
critical path delay is at most |
We compare our sequential |
||
multiplier with other multipliers of the same kinds in Table 2. In the table, |
|||
denotes the number of nonzero entries in the matrix |
It is well known [6] |
||
that |
if is odd and |
if is even. In our case of |
|
with |
it is easy, from (17) and (18), to see that |
has a more |
|
strong bound |
Thus the bounds |
|
in [3] is same |
to |
|
Consequently the |
|
circuit in [3] and our multiplier have more or less the same hardware complexity.
4.3Gaussian Normal Basis of Type 2 and 4 for ECC
Let |
be |
a prime such |
that |
|
i.e. |
either 2 |
is a primitive root |
or |
and |
|
Then the |
element |
|
|
where |
is a primitive pth root of unity in |
|
forms a normal |
||
basis |
|
in |
which we call a Gaussian normal basis of |
|||
type 2 (or a type II ONB). A multiplication matrix |
of |
has the following |
||||
property; |
if and only if |
|
for any choice of ± sign. |
|||
This is obvious from the basic properties of Gaussian normal basis in Section 3.
Since |
divides |
is a unique value |
satisfying |
|
|
That is, |
and the 0th row of |
is (0,1,0, ··· , 0). If |
|
then |
|
and thus ith |
row of |
contains exactly two |
nonzero entries. Therefore for the case of a type II optimal normal basis, we need
AND gates and |
|
XOR gates. Also the critical path delay is |
||
while that of [3] is |
Let us give a more explicit example for |
|||
the case |
|
|
|
|
Example 1. Let |
be a primitive 11th root of of unity in |
and let |
||
be a type II optimal normal element in |
The |
computations |
||
of |
are easily done |
from the following |
table. For each block |
|
TEAM LinG
Efficient Linear Array for Multiplication in |
Using a Normal Basis |
87 |
regarding K and
entry with
and
has the value
and
respectively, where
is a unique multiplicative subgroup of order 2 in 
From the above table, it can be found that |
|
and |
||
For example, the computation of |
can be done as follows. See the block |
|||
and find |
is in |
and |
is in |
Thus we have |
In fact, for the case of type II ONB, there is a more regular expression called a palindromic representation which enables us to find the multiplication table more easily. However for the general treatments of all Gaussian normal
bases of type for arbitrary |
we are following this rule. Note that for all other |
||
type II ONB where |
the multiplication table can be derived exactly the |
||
same manner. From (37), the corresponding matrix |
for |
is |
|
and using (24),(25),(28),(29), we find that the multiplication
of
and
is written as follows.
From this, one has the shift register arrangement of C = AB using a type II ONB in
for
and it is shown in Figure 3. Note that the underlined entries are the first terms to be computed. Also note that the (shifted) diagonal entries have the common terms of 
As is mentioned before, there exists only one field |
for which a type |
||
II ONB exists in |
among the recommended five fields |
||
|
by NIST. Though the circuits of multiplication using a |
||
type II ONB are presented in many places |
[1,3,10,11], the authors could not |
||
find an explicit example of a circuit design |
using a Gaussian normal basis of |
||
TEAM LinG
88 S. Kwon et al.
Fig. 3. A new multiplication circuit using a type II ONB in |
for |
type
Since there are two fields
for which a Gaussian normal basis of type 4 exists, it is worthwhile to study the multiplication and the corresponding circuit for this case. For the clarity of exposition, we will explain a Gaussian normal basis of type
in
for
Note that the general case can be dealt in the same manner as in the following example.
Example |
2. Let |
with |
where a Gauss period |
|
of type (7, 4) exists in |
In this case, the unique cyclic subgroup of order |
|||
4 in |
is |
|
Let be a primitive 29th |
|
root of unity in |
Thus letting |
a normal element |
is written |
|
as |
|
and |
is a normal basis in |
|
The computations of |
are done from the following table. For each |
|||
block regarding K and |
entry with |
and |
has the |
|
value |
and |
respectively. |
|
|
From the above table, we find
and
For example, see the block
for the expression of
The entries of
are 5,20,26,11. Now see the blocks of
and find 
Thus we get
From (40), the multiplication matrix
is written as
TEAM LinG
Efficient Linear Array for Multiplication in |
Using a Normal Basis |
89 |
and again using the relations (24),(25),(28),(29), we get the following multiplication result
In the following table,
is defined as 
For example, we have 
The corresponding shift register arrangement of C = AB using a Gaussian normal basis of type 4 in
for
is shown in Figure 4. Also note that the underlined entries are the first terms to be computed and the (shifted) diagonal entries have the common terms of
The critical path delay of the circuit using a type 4 Gaussian normal basis is only
and can be effectively realized for the case
and
also.
Fig. 4. A new multiplication circuit using a Gaussian normal basis of type 4 in
for
TEAM LinG
90 S. Kwon et al.
5Conclusions
In this paper, we proposed a low complexity sequential normal basis multiplier over
for odd
using a Gaussian normal basis of type
Since, in many cryptographic applications,
should be an odd integer or a prime, our assumption on
is not at all restrictive for a practical purpose. We presented a general method of constructing a circuit arrangement of the multiplier and showed explicit examples for the cases of type 2 and 4 Gaussian normal bases. Among the five binary fields,
with
recommended by NIST [16] for ECC, our examples cover the cases
since
has a type II ONB and
have a Gaussian normal basis of type 4. Our general method can also be applied to other fields
and
since they have a Gaussian normal basis of type 6 and 10, respectively. Compared with previously proposed architectures of the same kinds, our multiplier has a superior or comparable area complexity and delay time. Thus it is well suited for many applications such as VLSI implementation of elliptic curve cryptographic protocols.
References
1.G.B. Agnew, R.C. Mullin, I. Onyszchuk, and S.A. Vanstone, “An implementation for a fast public key cryptosystem,” J. Cryptology, vol. 3, pp. 63–79, 1991.
2.G.B. Agnew, R.C. Mullin, and S.A. Vanstone, “Fast exponentiation in 
Eurocrypt 88, Lecture Notes in Computer Science, vol. 330, pp. 251–255, 1988.
3.A. Reyhani-Masoleh and M.A. Hasan, “Low complexity sequential normal basis
multipliers over
16th IEEE Symposium on Computer Arithmetic, vol. 16, pp. 188–195, 2003.
4.A. Reyhani-Masoleh and M.A. Hasan, “A new construction of Massey-Omura parallel multiplier over
IEEE Trans. Computers, vol. 51, pp. 511–520, 2002.
5.A. Reyhani-Masoleh and M.A. Hasan, “Efficient multiplication beyond optimal normal bases,” IEEE Trans. Computers, vol. 52, pp. 428–439, 2003.
6.A.J. Menezes, I.F. Blake, S. Gao, R.C. Mullin, S.A. Vanstone, and T. Yaghoobian,
Applications of Finite Fields, Kluwer Academic Publisher, 1993.
7.J.L. Massy and J.K. Omura, “Computational method and apparatus for finite field arithmetic,” US Patent No. 4587627, 1986.
8.C. Paar, P. Fleischmann, and P. Roelse, “Efficient multiplier architectures for Galois fields
IEEE Trans. Computers, vol. 47, pp. 162–170, 1998.
9.E.R. Berlekamp, “Bit-serial Reed-Solomon encoders,” IEEE Trans. Inform. Theory, vol. 28, pp. 869–874, 1982.
10.B. Sunar and Ç.K. KOç, “An efficient optimal normal basis type II multiplier,” IEEE Trans. Computers, vol. 50, pp. 83–87, 2001.
11.H. Wu, M.A. Hasan, I.F. Blake, and S. Gao, “Finite field multiplier using redundant representation,” IEEE Trans. Computers, vol. 51, pp. 1306–1316, 2002.
12.S. Gao, J. von zur Gathen, and D. Panario, “Orders and cryptographical applications,” Math. Comp., vol. 67, pp. 343–352, 1998.
13.J. von zur Gathen and I. Shparlinski, “Orders of Gauss periods in finite fields,”
ISAAC 95, Lecture Notes in Computer Science, vol. 1004, pp. 208–215, 1995.
TEAM LinG
Efficient Linear Array for Multiplication in |
Using a Normal Basis |
91 |
14.S. Gao and S. Vanstone, “On orders of optimal normal basis generators,” Math. Comp., vol. 64, pp. 1227–1233, 1995.
15.S. Feisel, J. von zur Gathen, M. Shokrollahi, “Normal bases via general Gauss periods,” Math. Comp., vol. 68, pp. 271–290, 1999.
16.NIST, “Digital Signature Standard,” FIPS Publication, 186-2, February, 2000.
TEAM LinG
Low-Power Elliptic Curve Cryptography Using
Scaled Modular Arithmetic
E. Öztürk1, B. Sunar1, and
2
1 Department of Electrical & Computer Engineering, Worcester Polytechnic
Institute, Worcester MA, 01609, USA,
erdinc, sunar@wpi.edu
2 Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey
TR-34956
erkays@sabanciuniv.edu
Abstract. We introduce new modulus scaling techniques for transforming a class of primes into special forms which enables efficient arithmetic. The scaling technique may be used to improve multiplication and inversion in finite fields. We present an efficient inversion algorithm that utilizes the structure of scaled modulus. Our inversion algorithm exhibits superior performance to the Euclidean algorithm and lends itself to efficient hardware implementation due to its simplicity. Using the scaled modulus technique and our specialized inversion algorithm we develop an elliptic curve processor architecture. The resulting architecture successfully utilizes redundant representation of elements in
and provides a low-power, high speed, and small footprint specialized elliptic curve implementation.
1Introduction
Modular arithmetic has a variety of applications in cryptography. Many publickey algorithms heavily depend on modular arithmetic. Among these RSA encryption and digital signature schemes, discrete logarithm problem (DLP) based schemes such as the Diffie-Helman key agreement [4] and El-Gamal encryption and signature schemes [8], and elliptic curve cryptography [6,7] play an important role in authentication and encryption protocols. The implementation of RSA based schemes requires the arithmetic of integers modulo a large integer, that is in the form of a product of two large primes
On the other hand, implementations of Diffie-Helman and El-Gamal schemes are based on the arithmetic of integers modulo a large prime
While ECDSA is built on complex algebraic structures, the underlying arithmetic operations are either modular operations with respect to a large prime modulus
case) or polynomial arithmetic modulo a high degree irreducible polynomial defined over the finite field GF(2)
case). Special moduli for
arithmetic were also proposed [2,10]. Low Hamming-weight irreducible polynomials such as trinomials and pentanomials became a popular choice [10,1] for both hardware and software implementations of ECDSA over
Particularly, trinomials
M. Joye and J.-J. Quisquater (Eds.): CHES 2004, LNCS 3156, pp. 92–106, 2004. |
|
© International Association for Cryptologic Research 2004 |
TEAM LinG |
|
Low-Power Elliptic Curve Cryptography |
93 |
|
of the form |
allow efficient reduction. For many bit-lengths such polyno- |
||
mials do not exist; therefore less efficient trinomials, i.e. |
with |
|
|
or pentanomials, i.e. |
are used instead. Hence, in many |
||
cases the performance suffers degradation due to extra additions and alignment adjustments.
In this paper we utilize integer moduli of special form, which is reminiscent of low-Hamming weight polynomials. Although the idea of using a low-Hamming weight integer modulus is not new [3], its application to Elliptic Curve Cryptography was limited to only elliptic curves defined over Optimal Extension Fields (i.e.
with mid-size
of special form), or non-optimal primes such as those utilized by the NIST curves. In this work we achieve moduli of Mersenne form by introducing a modulus scaling technique. This allows us to develop a fast inversion algorithm that lends itself to efficient inversion hardware. For proof of concept we implemented a specialized elliptic curve processor. Besides using scaled arithmetic and the special inversion algorithm, we introduced several innovations at the hardware level such as a fast comparator for redundant arithmetic and shared arithmetic core for power optimization. The resulting architecture requires extremely low power at very small footprint and provides reasonable execution speed.
2Previous Work
A straightforward method to implement integer and polynomial modular multiplications is to first compute the product of the two operands,
and then to reduce the product using the modulus,
Traditionally, the reduction step is implemented by a division operation, which is significantly more demanding than the initial multiplication. To alleviate the reduction problem in integer modular multiplications, Crandall proposed [3] using special primes, primes of the form
where
is a small integer constant. By using special primes, modular reduction turns into a multiplication operation by the small constant
that, in many cases, may be performed by a series of less expensive shift and add operations:
It should be noticed that
is not fully reduced. Depending on the length of
a few more reductions are needed. The best possible choice for a special prime
is a Mersenne prime, |
with |
fixed to a word-boundary. In this case, |
||
the reduction operation becomes a simple modular addition |
|
|||
Similarly primes of the form |
may simplify reduction into a modular sub- |
|||
traction |
|
Unfortunately, Mersenne primes and primes of the |
||
form |
are scarce. For degrees up to 1000 no primes of form |
and only |
||
the two Mersenne primes |
and |
exist. Moreover, these primes are |
||
too large for ECDSA which utilizes bit-lengths in the range 160 – 350. Hence, a
TEAM LinG
94 E. Öztürk, B. Sunar, and 
more practical choice is to use primes of the form
For a constant larger than
and a degree
that is not aligned to a word boundary, some extra shifts and additions may be needed. To relax the restrictions, Solinas [11] introduced a generalization for special primes. His technique is based on signed bit recoding. While increasing the number of possible special primes, additional low-level operations are needed. The special modulus reduction technique introduced by Crandall [3] restricts the constant
in
to a small constant that fits into a single word.
3Modulus Scaling Techniques
The idea of modulus scaling was introduced by Walter [13]. In this work, the modulus was scaled to obtain a certain representation in the higher order bits, which helped the estimation of the quotient in Barrett’s reduction technique. The method works by scaling to the prime modulus to obtain a new modulus,
Reducing an integer
using the new modulus
will produce a result that is congruent to the residue obtained by reducing
modulo
This follows from the fact that reduction is a repetitive subtraction of the modulus. Subtracting
is equivalent to
times subtracting
and thus
When a scaled modulus is used, residues will be in the range 
The number is not fully reduced and essentially we are using a redundant representation where an integer is represented using
more bits than necessary. Consequently, it will be necessary that the final result is reduced by
to obtain a fully reduced representation. Here we wish to use scaling to produce moduli of special form. If a random pattern appears in a modulus, it will not be possible to use the low-weight optimizations discussed in Section 2. However, by finding a suitable small constant
it may be possible to scale the prime
to obtain a new modulus of special form, that is either of low-weight or in a form that allows efficient recoding. To keep the redundancy minimal, the scaling factor must be small compared to the original modulus. Assuming a random modulus, such a small factor might be hard or even impossible to find. We concentrate again on primes of special forms. We present two heuristics that form a basis for efficient on-the-fly scaling:
Heuristic 1 If the base B representation of an integer contains a series of repeating digits, scaling the integer with the largest possible digit, produces a string of repeating zero digits in the scaled and recoded integer.
The justification of the heuristic is quite simple. Assume the representation of the modulus in base B contains a repeating digit of arbitrary value D. We use the constant scaling factor
to compute
When a string of repeating D-digits is multiplied with the scaling factor, and written in base B we obtain the following
The bar over the least significant digit denotes a negative valued digit.
TEAM LinG
Low-Power Elliptic Curve Cryptography |
95 |
The presented scaling technique is simple, efficient, and only requires the modulus to have repeating digits. Since the scaling factor is fixed and only depends on the length of the repeating pattern – not its value –, a modulus with multiple repeating digits can be scaled properly at the cost of increasing the length of the modulus by a single digit. We present another heuristics for scaling, this technique is more efficient but more restrictive on the modulus.
Heuristic 2 Given a modulus containing repeating D-digits in base B representation, if B – 1 is divisible by the repeating digit, then the modulus can be efficiently scaled by the factor 
As earlier the heuristic is verified by multiplying a string of repeating digits with the scaling factor and then by recoding.
We provide two examples for the heuristics in Appendix A. We have compiled a list of primes that when scaled with a small factor produce moduli of the form
in Table 4 (see Appendix A). These primes provide a wide range of perfect choices for the implementation of cryptographic schemes.
4Scaled Modular Inversion
In this section we consider the application of scaled arithmetic to implement more efficient inversion operations. An efficient way of calculating multiplicative inverses is to use binary extended Euclidean based algorithms. The Montgomery inversion algorithm proposed by Kaliski [5] is one of the most efficient inversion algorithms for random primes. Montgomery inversion, however, is not suitable when used with scaled primes since it does not exploit our special moduli. Furthermore, it can be used only when Montgomery arithmetic is employed. Therefore, what we need is an algorithm that takes advantage of the proposed special moduli. Thomas et al. [12] proposed the Algorithm X for Mersenne primes of the form
(see Appendix B).
Due to its simplicity Algorithm X is likely to yield an efficient hardware implementation. Another advantage of Algorithm X is the fact that the carry-free arithmetic can be employed. The main problem with other binary extended Euclidean algorithms is that they usually have a step involving comparison of two integers. The comparison in Algorithm X is much simpler and may be implemented easily using carry-free arithmetic.
The algorithm can be modified to support the other types of special moduli as
well. For instance, changing Step 4 of the algorithm to |
|
will make the algorithm work for special moduli of the form |
with almost |
no penalty in the implementation. The only problem with a special modulus,
is the fact that it is not prime (but multiple of a prime,
and therefore
TEAM LinG
