Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Joye M.Cryptographic hardware and embedded systems.2005

.pdf
Скачиваний:
39
Добавлен:
23.08.2013
Размер:
19.28 Mб
Скачать

86 S. Kwon et al.

When

then the number of nonzero entries of

is one

because

 

Therefore we need one AND gate and no XOR gate to

compute

with

Since the addition

 

in (35)

needs one XOR gate for each

the total gate complexity of our

multiplier is

AND gates plus at most

 

XOR gates. The critical

path delay can also be evaluated easily. It is clear from (35) and (36) that the

critical path delay is at most

We compare our sequential

multiplier with other multipliers of the same kinds in Table 2. In the table,

denotes the number of nonzero entries in the matrix

It is well known [6]

that

if is odd and

if is even. In our case of

with

it is easy, from (17) and (18), to see that

has a more

strong bound

Thus the bounds

 

in [3] is same

to

 

Consequently the

circuit in [3] and our multiplier have more or less the same hardware complexity.

4.3Gaussian Normal Basis of Type 2 and 4 for ECC

Let

be

a prime such

that

 

i.e.

either 2

is a primitive root

or

and

 

Then the

element

 

where

is a primitive pth root of unity in

 

forms a normal

basis

 

in

which we call a Gaussian normal basis of

type 2 (or a type II ONB). A multiplication matrix

of

has the following

property;

if and only if

 

for any choice of ± sign.

This is obvious from the basic properties of Gaussian normal basis in Section 3.

Since

divides

is a unique value

satisfying

 

That is,

and the 0th row of

is (0,1,0, ··· , 0). If

then

 

and thus ith

row of

contains exactly two

nonzero entries. Therefore for the case of a type II optimal normal basis, we need

AND gates and

 

XOR gates. Also the critical path delay is

while that of [3] is

Let us give a more explicit example for

the case

 

 

 

 

Example 1. Let

be a primitive 11th root of of unity in

and let

be a type II optimal normal element in

The

computations

of

are easily done

from the following

table. For each block

TEAM LinG

Efficient Linear Array for Multiplication in

Using a Normal Basis

87

regarding K and entry with and has the value and respectively, where is a unique multiplicative subgroup of order 2 in

From the above table, it can be found that

 

and

For example, the computation of

can be done as follows. See the block

and find

is in

and

is in

Thus we have

In fact, for the case of type II ONB, there is a more regular expression called a palindromic representation which enables us to find the multiplication table more easily. However for the general treatments of all Gaussian normal

bases of type for arbitrary

we are following this rule. Note that for all other

type II ONB where

the multiplication table can be derived exactly the

same manner. From (37), the corresponding matrix

for

is

and using (24),(25),(28),(29), we find that the multiplication of and is written as follows.

From this, one has the shift register arrangement of C = AB using a type II ONB in for and it is shown in Figure 3. Note that the underlined entries are the first terms to be computed. Also note that the (shifted) diagonal entries have the common terms of

As is mentioned before, there exists only one field

for which a type

II ONB exists in

among the recommended five fields

 

by NIST. Though the circuits of multiplication using a

type II ONB are presented in many places

[1,3,10,11], the authors could not

find an explicit example of a circuit design

using a Gaussian normal basis of

TEAM LinG

88 S. Kwon et al.

Fig. 3. A new multiplication circuit using a type II ONB in

for

type Since there are two fields for which a Gaussian normal basis of type 4 exists, it is worthwhile to study the multiplication and the corresponding circuit for this case. For the clarity of exposition, we will explain a Gaussian normal basis of type in for Note that the general case can be dealt in the same manner as in the following example.

Example

2. Let

with

where a Gauss period

of type (7, 4) exists in

In this case, the unique cyclic subgroup of order

4 in

is

 

Let be a primitive 29th

root of unity in

Thus letting

a normal element

is written

as

 

and

is a normal basis in

The computations of

are done from the following table. For each

block regarding K and

entry with

and

has the

value

and

respectively.

 

 

From the above table, we find and

For example, see the block for the expression of The entries of are 5,20,26,11. Now see the blocks of and find Thus we get From (40), the multiplication matrix is written as

TEAM LinG

Efficient Linear Array for Multiplication in

Using a Normal Basis

89

and again using the relations (24),(25),(28),(29), we get the following multiplication result In the following table, is defined as For example, we have

The corresponding shift register arrangement of C = AB using a Gaussian normal basis of type 4 in for is shown in Figure 4. Also note that the underlined entries are the first terms to be computed and the (shifted) diagonal entries have the common terms of The critical path delay of the circuit using a type 4 Gaussian normal basis is only and can be effectively realized for the case and also.

Fig. 4. A new multiplication circuit using a Gaussian normal basis of type 4 in for

TEAM LinG

90 S. Kwon et al.

5Conclusions

In this paper, we proposed a low complexity sequential normal basis multiplier over for odd using a Gaussian normal basis of type Since, in many cryptographic applications, should be an odd integer or a prime, our assumption on is not at all restrictive for a practical purpose. We presented a general method of constructing a circuit arrangement of the multiplier and showed explicit examples for the cases of type 2 and 4 Gaussian normal bases. Among the five binary fields, with recommended by NIST [16] for ECC, our examples cover the cases since has a type II ONB and have a Gaussian normal basis of type 4. Our general method can also be applied to other fields and since they have a Gaussian normal basis of type 6 and 10, respectively. Compared with previously proposed architectures of the same kinds, our multiplier has a superior or comparable area complexity and delay time. Thus it is well suited for many applications such as VLSI implementation of elliptic curve cryptographic protocols.

References

1.G.B. Agnew, R.C. Mullin, I. Onyszchuk, and S.A. Vanstone, “An implementation for a fast public key cryptosystem,” J. Cryptology, vol. 3, pp. 63–79, 1991.

2.G.B. Agnew, R.C. Mullin, and S.A. Vanstone, “Fast exponentiation in

Eurocrypt 88, Lecture Notes in Computer Science, vol. 330, pp. 251–255, 1988.

3.A. Reyhani-Masoleh and M.A. Hasan, “Low complexity sequential normal basis

multipliers over 16th IEEE Symposium on Computer Arithmetic, vol. 16, pp. 188–195, 2003.

4.A. Reyhani-Masoleh and M.A. Hasan, “A new construction of Massey-Omura parallel multiplier over IEEE Trans. Computers, vol. 51, pp. 511–520, 2002.

5.A. Reyhani-Masoleh and M.A. Hasan, “Efficient multiplication beyond optimal normal bases,” IEEE Trans. Computers, vol. 52, pp. 428–439, 2003.

6.A.J. Menezes, I.F. Blake, S. Gao, R.C. Mullin, S.A. Vanstone, and T. Yaghoobian,

Applications of Finite Fields, Kluwer Academic Publisher, 1993.

7.J.L. Massy and J.K. Omura, “Computational method and apparatus for finite field arithmetic,” US Patent No. 4587627, 1986.

8.C. Paar, P. Fleischmann, and P. Roelse, “Efficient multiplier architectures for Galois fields IEEE Trans. Computers, vol. 47, pp. 162–170, 1998.

9.E.R. Berlekamp, “Bit-serial Reed-Solomon encoders,” IEEE Trans. Inform. Theory, vol. 28, pp. 869–874, 1982.

10.B. Sunar and Ç.K. KOç, “An efficient optimal normal basis type II multiplier,” IEEE Trans. Computers, vol. 50, pp. 83–87, 2001.

11.H. Wu, M.A. Hasan, I.F. Blake, and S. Gao, “Finite field multiplier using redundant representation,” IEEE Trans. Computers, vol. 51, pp. 1306–1316, 2002.

12.S. Gao, J. von zur Gathen, and D. Panario, “Orders and cryptographical applications,” Math. Comp., vol. 67, pp. 343–352, 1998.

13.J. von zur Gathen and I. Shparlinski, “Orders of Gauss periods in finite fields,”

ISAAC 95, Lecture Notes in Computer Science, vol. 1004, pp. 208–215, 1995.

TEAM LinG

Efficient Linear Array for Multiplication in

Using a Normal Basis

91

14.S. Gao and S. Vanstone, “On orders of optimal normal basis generators,” Math. Comp., vol. 64, pp. 1227–1233, 1995.

15.S. Feisel, J. von zur Gathen, M. Shokrollahi, “Normal bases via general Gauss periods,” Math. Comp., vol. 68, pp. 271–290, 1999.

16.NIST, “Digital Signature Standard,” FIPS Publication, 186-2, February, 2000.

TEAM LinG

Low-Power Elliptic Curve Cryptography Using

Scaled Modular Arithmetic

E. Öztürk1, B. Sunar1, and 2

1 Department of Electrical & Computer Engineering, Worcester Polytechnic

Institute, Worcester MA, 01609, USA,

erdinc, sunar@wpi.edu

2 Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey

TR-34956

erkays@sabanciuniv.edu

Abstract. We introduce new modulus scaling techniques for transforming a class of primes into special forms which enables efficient arithmetic. The scaling technique may be used to improve multiplication and inversion in finite fields. We present an efficient inversion algorithm that utilizes the structure of scaled modulus. Our inversion algorithm exhibits superior performance to the Euclidean algorithm and lends itself to efficient hardware implementation due to its simplicity. Using the scaled modulus technique and our specialized inversion algorithm we develop an elliptic curve processor architecture. The resulting architecture successfully utilizes redundant representation of elements in and provides a low-power, high speed, and small footprint specialized elliptic curve implementation.

1Introduction

Modular arithmetic has a variety of applications in cryptography. Many publickey algorithms heavily depend on modular arithmetic. Among these RSA encryption and digital signature schemes, discrete logarithm problem (DLP) based schemes such as the Diffie-Helman key agreement [4] and El-Gamal encryption and signature schemes [8], and elliptic curve cryptography [6,7] play an important role in authentication and encryption protocols. The implementation of RSA based schemes requires the arithmetic of integers modulo a large integer, that is in the form of a product of two large primes On the other hand, implementations of Diffie-Helman and El-Gamal schemes are based on the arithmetic of integers modulo a large prime While ECDSA is built on complex algebraic structures, the underlying arithmetic operations are either modular operations with respect to a large prime modulus case) or polynomial arithmetic modulo a high degree irreducible polynomial defined over the finite field GF(2) case). Special moduli for arithmetic were also proposed [2,10]. Low Hamming-weight irreducible polynomials such as trinomials and pentanomials became a popular choice [10,1] for both hardware and software implementations of ECDSA over Particularly, trinomials

M. Joye and J.-J. Quisquater (Eds.): CHES 2004, LNCS 3156, pp. 92–106, 2004.

 

© International Association for Cryptologic Research 2004

TEAM LinG

 

Low-Power Elliptic Curve Cryptography

93

of the form

allow efficient reduction. For many bit-lengths such polyno-

mials do not exist; therefore less efficient trinomials, i.e.

with

 

or pentanomials, i.e.

are used instead. Hence, in many

cases the performance suffers degradation due to extra additions and alignment adjustments.

In this paper we utilize integer moduli of special form, which is reminiscent of low-Hamming weight polynomials. Although the idea of using a low-Hamming weight integer modulus is not new [3], its application to Elliptic Curve Cryptography was limited to only elliptic curves defined over Optimal Extension Fields (i.e. with mid-size of special form), or non-optimal primes such as those utilized by the NIST curves. In this work we achieve moduli of Mersenne form by introducing a modulus scaling technique. This allows us to develop a fast inversion algorithm that lends itself to efficient inversion hardware. For proof of concept we implemented a specialized elliptic curve processor. Besides using scaled arithmetic and the special inversion algorithm, we introduced several innovations at the hardware level such as a fast comparator for redundant arithmetic and shared arithmetic core for power optimization. The resulting architecture requires extremely low power at very small footprint and provides reasonable execution speed.

2Previous Work

A straightforward method to implement integer and polynomial modular multiplications is to first compute the product of the two operands, and then to reduce the product using the modulus, Traditionally, the reduction step is implemented by a division operation, which is significantly more demanding than the initial multiplication. To alleviate the reduction problem in integer modular multiplications, Crandall proposed [3] using special primes, primes of the form where is a small integer constant. By using special primes, modular reduction turns into a multiplication operation by the small constant that, in many cases, may be performed by a series of less expensive shift and add operations:

It should be noticed that is not fully reduced. Depending on the length of a few more reductions are needed. The best possible choice for a special prime

is a Mersenne prime,

with

fixed to a word-boundary. In this case,

the reduction operation becomes a simple modular addition

 

Similarly primes of the form

may simplify reduction into a modular sub-

traction

 

Unfortunately, Mersenne primes and primes of the

form

are scarce. For degrees up to 1000 no primes of form

and only

the two Mersenne primes

and

exist. Moreover, these primes are

too large for ECDSA which utilizes bit-lengths in the range 160 – 350. Hence, a

TEAM LinG

94 E. Öztürk, B. Sunar, and

more practical choice is to use primes of the form For a constant larger than and a degree that is not aligned to a word boundary, some extra shifts and additions may be needed. To relax the restrictions, Solinas [11] introduced a generalization for special primes. His technique is based on signed bit recoding. While increasing the number of possible special primes, additional low-level operations are needed. The special modulus reduction technique introduced by Crandall [3] restricts the constant in to a small constant that fits into a single word.

3Modulus Scaling Techniques

The idea of modulus scaling was introduced by Walter [13]. In this work, the modulus was scaled to obtain a certain representation in the higher order bits, which helped the estimation of the quotient in Barrett’s reduction technique. The method works by scaling to the prime modulus to obtain a new modulus, Reducing an integer using the new modulus will produce a result that is congruent to the residue obtained by reducing modulo This follows from the fact that reduction is a repetitive subtraction of the modulus. Subtracting is equivalent to times subtracting and thus When a scaled modulus is used, residues will be in the range The number is not fully reduced and essentially we are using a redundant representation where an integer is represented using more bits than necessary. Consequently, it will be necessary that the final result is reduced by to obtain a fully reduced representation. Here we wish to use scaling to produce moduli of special form. If a random pattern appears in a modulus, it will not be possible to use the low-weight optimizations discussed in Section 2. However, by finding a suitable small constant it may be possible to scale the prime to obtain a new modulus of special form, that is either of low-weight or in a form that allows efficient recoding. To keep the redundancy minimal, the scaling factor must be small compared to the original modulus. Assuming a random modulus, such a small factor might be hard or even impossible to find. We concentrate again on primes of special forms. We present two heuristics that form a basis for efficient on-the-fly scaling:

Heuristic 1 If the base B representation of an integer contains a series of repeating digits, scaling the integer with the largest possible digit, produces a string of repeating zero digits in the scaled and recoded integer.

The justification of the heuristic is quite simple. Assume the representation of the modulus in base B contains a repeating digit of arbitrary value D. We use the constant scaling factor to compute When a string of repeating D-digits is multiplied with the scaling factor, and written in base B we obtain the following

The bar over the least significant digit denotes a negative valued digit.

TEAM LinG

Low-Power Elliptic Curve Cryptography

95

The presented scaling technique is simple, efficient, and only requires the modulus to have repeating digits. Since the scaling factor is fixed and only depends on the length of the repeating pattern – not its value –, a modulus with multiple repeating digits can be scaled properly at the cost of increasing the length of the modulus by a single digit. We present another heuristics for scaling, this technique is more efficient but more restrictive on the modulus.

Heuristic 2 Given a modulus containing repeating D-digits in base B representation, if B – 1 is divisible by the repeating digit, then the modulus can be efficiently scaled by the factor

As earlier the heuristic is verified by multiplying a string of repeating digits with the scaling factor and then by recoding.

We provide two examples for the heuristics in Appendix A. We have compiled a list of primes that when scaled with a small factor produce moduli of the form in Table 4 (see Appendix A). These primes provide a wide range of perfect choices for the implementation of cryptographic schemes.

4Scaled Modular Inversion

In this section we consider the application of scaled arithmetic to implement more efficient inversion operations. An efficient way of calculating multiplicative inverses is to use binary extended Euclidean based algorithms. The Montgomery inversion algorithm proposed by Kaliski [5] is one of the most efficient inversion algorithms for random primes. Montgomery inversion, however, is not suitable when used with scaled primes since it does not exploit our special moduli. Furthermore, it can be used only when Montgomery arithmetic is employed. Therefore, what we need is an algorithm that takes advantage of the proposed special moduli. Thomas et al. [12] proposed the Algorithm X for Mersenne primes of the form (see Appendix B).

Due to its simplicity Algorithm X is likely to yield an efficient hardware implementation. Another advantage of Algorithm X is the fact that the carry-free arithmetic can be employed. The main problem with other binary extended Euclidean algorithms is that they usually have a step involving comparison of two integers. The comparison in Algorithm X is much simpler and may be implemented easily using carry-free arithmetic.

The algorithm can be modified to support the other types of special moduli as

well. For instance, changing Step 4 of the algorithm to

 

will make the algorithm work for special moduli of the form

with almost

no penalty in the implementation. The only problem with a special modulus, is the fact that it is not prime (but multiple of a prime, and therefore

TEAM LinG