Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Joye M.Cryptographic hardware and embedded systems.2005

.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

19.28 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 1011 / 4811 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

86 S. Kwon et al.

When	then the number of nonzero entries of			is one
because		Therefore we need one AND gate and no XOR gate to
compute	with	Since the addition		in (35)
needs one XOR gate for each			the total gate complexity of our
multiplier is	AND gates plus at most			XOR gates. The critical

path delay can also be evaluated easily. It is clear from (35) and (36) that the

critical path delay is at most		We compare our sequential
multiplier with other multipliers of the same kinds in Table 2. In the table,
denotes the number of nonzero entries in the matrix		It is well known [6]
that	if is odd and	if is even. In our case of
with	it is easy, from (17) and (18), to see that		has a more
strong bound	Thus the bounds		in [3] is same
to		Consequently the

circuit in [3] and our multiplier have more or less the same hardware complexity.

4.3Gaussian Normal Basis of Type 2 and 4 for ECC

Let	be	a prime such	that		i.e.	either 2
is a primitive root		or	and		Then the	element
	where	is a primitive pth root of unity in			forms a normal
basis		in	which we call a Gaussian normal basis of
type 2 (or a type II ONB). A multiplication matrix				of	has the following
property;	if and only if			for any choice of ± sign.

This is obvious from the basic properties of Gaussian normal basis in Section 3.

Since	divides	is a unique value		satisfying
	That is,	and the 0th row of		is (0,1,0, ··· , 0). If
then		and thus ith	row of	contains exactly two

nonzero entries. Therefore for the case of a type II optimal normal basis, we need

AND gates and		XOR gates. Also the critical path delay is
while that of [3] is		Let us give a more explicit example for
the case
Example 1. Let	be a primitive 11th root of of unity in			and let
be a type II optimal normal element in			The	computations
of	are easily done	from the following	table. For each block

TEAM LinG

Efficient Linear Array for Multiplication in

Using a Normal Basis

regarding K and entry with and has the value and respectively, where is a unique multiplicative subgroup of order 2 in

From the above table, it can be found that				and
For example, the computation of		can be done as follows. See the block
and find	is in	and	is in	Thus we have

In fact, for the case of type II ONB, there is a more regular expression called a palindromic representation which enables us to find the multiplication table more easily. However for the general treatments of all Gaussian normal


bases of type for arbitrary	we are following this rule. Note that for all other
type II ONB where	the multiplication table can be derived exactly the
same manner. From (37), the corresponding matrix		for	is

and using (24),(25),(28),(29), we find that the multiplication of and is written as follows.

From this, one has the shift register arrangement of C = AB using a type II ONB in for and it is shown in Figure 3. Note that the underlined entries are the first terms to be computed. Also note that the (shifted) diagonal entries have the common terms of


As is mentioned before, there exists only one field			for which a type
II ONB exists in	among the recommended five fields
	by NIST. Though the circuits of multiplication using a
type II ONB are presented in many places		[1,3,10,11], the authors could not
find an explicit example of a circuit design		using a Gaussian normal basis of

TEAM LinG

88 S. Kwon et al.

Fig. 3. A new multiplication circuit using a type II ONB in

for

type Since there are two fields for which a Gaussian normal basis of type 4 exists, it is worthwhile to study the multiplication and the corresponding circuit for this case. For the clarity of exposition, we will explain a Gaussian normal basis of type in for Note that the general case can be dealt in the same manner as in the following example.

Example	2. Let	with	where a Gauss period
of type (7, 4) exists in		In this case, the unique cyclic subgroup of order
4 in	is		Let be a primitive 29th
root of unity in		Thus letting	a normal element	is written
as		and	is a normal basis in
The computations of		are done from the following table. For each
block regarding K and		entry with	and	has the
value	and	respectively.

From the above table, we find and

For example, see the block for the expression of The entries of are 5,20,26,11. Now see the blocks of and find Thus we get From (40), the multiplication matrix is written as

TEAM LinG

Efficient Linear Array for Multiplication in

Using a Normal Basis

and again using the relations (24),(25),(28),(29), we get the following multiplication result In the following table, is defined as For example, we have

The corresponding shift register arrangement of C = AB using a Gaussian normal basis of type 4 in for is shown in Figure 4. Also note that the underlined entries are the first terms to be computed and the (shifted) diagonal entries have the common terms of The critical path delay of the circuit using a type 4 Gaussian normal basis is only and can be effectively realized for the case and also.

Fig. 4. A new multiplication circuit using a Gaussian normal basis of type 4 in for

TEAM LinG

90 S. Kwon et al.

5Conclusions

In this paper, we proposed a low complexity sequential normal basis multiplier over for odd using a Gaussian normal basis of type Since, in many cryptographic applications, should be an odd integer or a prime, our assumption on is not at all restrictive for a practical purpose. We presented a general method of constructing a circuit arrangement of the multiplier and showed explicit examples for the cases of type 2 and 4 Gaussian normal bases. Among the five binary fields, with recommended by NIST [16] for ECC, our examples cover the cases since has a type II ONB and have a Gaussian normal basis of type 4. Our general method can also be applied to other fields and since they have a Gaussian normal basis of type 6 and 10, respectively. Compared with previously proposed architectures of the same kinds, our multiplier has a superior or comparable area complexity and delay time. Thus it is well suited for many applications such as VLSI implementation of elliptic curve cryptographic protocols.

References

1.G.B. Agnew, R.C. Mullin, I. Onyszchuk, and S.A. Vanstone, “An implementation for a fast public key cryptosystem,” J. Cryptology, vol. 3, pp. 63–79, 1991.

2.G.B. Agnew, R.C. Mullin, and S.A. Vanstone, “Fast exponentiation in

Eurocrypt 88, Lecture Notes in Computer Science, vol. 330, pp. 251–255, 1988.

3.A. Reyhani-Masoleh and M.A. Hasan, “Low complexity sequential normal basis

multipliers over 16th IEEE Symposium on Computer Arithmetic, vol. 16, pp. 188–195, 2003.

4.A. Reyhani-Masoleh and M.A. Hasan, “A new construction of Massey-Omura parallel multiplier over IEEE Trans. Computers, vol. 51, pp. 511–520, 2002.

5.A. Reyhani-Masoleh and M.A. Hasan, “Efficient multiplication beyond optimal normal bases,” IEEE Trans. Computers, vol. 52, pp. 428–439, 2003.

6.A.J. Menezes, I.F. Blake, S. Gao, R.C. Mullin, S.A. Vanstone, and T. Yaghoobian,

Applications of Finite Fields, Kluwer Academic Publisher, 1993.

7.J.L. Massy and J.K. Omura, “Computational method and apparatus for finite field arithmetic,” US Patent No. 4587627, 1986.

8.C. Paar, P. Fleischmann, and P. Roelse, “Efficient multiplier architectures for Galois fields IEEE Trans. Computers, vol. 47, pp. 162–170, 1998.

9.E.R. Berlekamp, “Bit-serial Reed-Solomon encoders,” IEEE Trans. Inform. Theory, vol. 28, pp. 869–874, 1982.

10.B. Sunar and Ç.K. KOç, “An efficient optimal normal basis type II multiplier,” IEEE Trans. Computers, vol. 50, pp. 83–87, 2001.

11.H. Wu, M.A. Hasan, I.F. Blake, and S. Gao, “Finite field multiplier using redundant representation,” IEEE Trans. Computers, vol. 51, pp. 1306–1316, 2002.

12.S. Gao, J. von zur Gathen, and D. Panario, “Orders and cryptographical applications,” Math. Comp., vol. 67, pp. 343–352, 1998.

13.J. von zur Gathen and I. Shparlinski, “Orders of Gauss periods in finite fields,”

ISAAC 95, Lecture Notes in Computer Science, vol. 1004, pp. 208–215, 1995.

TEAM LinG

Efficient Linear Array for Multiplication in

Using a Normal Basis

14.S. Gao and S. Vanstone, “On orders of optimal normal basis generators,” Math. Comp., vol. 64, pp. 1227–1233, 1995.

15.S. Feisel, J. von zur Gathen, M. Shokrollahi, “Normal bases via general Gauss periods,” Math. Comp., vol. 68, pp. 271–290, 1999.

16.NIST, “Digital Signature Standard,” FIPS Publication, 186-2, February, 2000.

TEAM LinG

Low-Power Elliptic Curve Cryptography Using

Scaled Modular Arithmetic

E. Öztürk1, B. Sunar1, and 2

1 Department of Electrical & Computer Engineering, Worcester Polytechnic

Institute, Worcester MA, 01609, USA,

erdinc, sunar@wpi.edu

2 Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul, Turkey

TR-34956

erkays@sabanciuniv.edu

Abstract. We introduce new modulus scaling techniques for transforming a class of primes into special forms which enables efficient arithmetic. The scaling technique may be used to improve multiplication and inversion in finite fields. We present an efficient inversion algorithm that utilizes the structure of scaled modulus. Our inversion algorithm exhibits superior performance to the Euclidean algorithm and lends itself to efficient hardware implementation due to its simplicity. Using the scaled modulus technique and our specialized inversion algorithm we develop an elliptic curve processor architecture. The resulting architecture successfully utilizes redundant representation of elements in and provides a low-power, high speed, and small footprint specialized elliptic curve implementation.

1Introduction

Modular arithmetic has a variety of applications in cryptography. Many publickey algorithms heavily depend on modular arithmetic. Among these RSA encryption and digital signature schemes, discrete logarithm problem (DLP) based schemes such as the Diffie-Helman key agreement [4] and El-Gamal encryption and signature schemes [8], and elliptic curve cryptography [6,7] play an important role in authentication and encryption protocols. The implementation of RSA based schemes requires the arithmetic of integers modulo a large integer, that is in the form of a product of two large primes On the other hand, implementations of Diffie-Helman and El-Gamal schemes are based on the arithmetic of integers modulo a large prime While ECDSA is built on complex algebraic structures, the underlying arithmetic operations are either modular operations with respect to a large prime modulus case) or polynomial arithmetic modulo a high degree irreducible polynomial defined over the finite field GF(2) case). Special moduli for arithmetic were also proposed [2,10]. Low Hamming-weight irreducible polynomials such as trinomials and pentanomials became a popular choice [10,1] for both hardware and software implementations of ECDSA over Particularly, trinomials

M. Joye and J.-J. Quisquater (Eds.): CHES 2004, LNCS 3156, pp. 92–106, 2004.
© International Association for Cryptologic Research 2004	TEAM LinG


	Low-Power Elliptic Curve Cryptography		93
of the form	allow efficient reduction. For many bit-lengths such polyno-
mials do not exist; therefore less efficient trinomials, i.e.		with
or pentanomials, i.e.	are used instead. Hence, in many

cases the performance suffers degradation due to extra additions and alignment adjustments.

In this paper we utilize integer moduli of special form, which is reminiscent of low-Hamming weight polynomials. Although the idea of using a low-Hamming weight integer modulus is not new [3], its application to Elliptic Curve Cryptography was limited to only elliptic curves defined over Optimal Extension Fields (i.e. with mid-size of special form), or non-optimal primes such as those utilized by the NIST curves. In this work we achieve moduli of Mersenne form by introducing a modulus scaling technique. This allows us to develop a fast inversion algorithm that lends itself to efficient inversion hardware. For proof of concept we implemented a specialized elliptic curve processor. Besides using scaled arithmetic and the special inversion algorithm, we introduced several innovations at the hardware level such as a fast comparator for redundant arithmetic and shared arithmetic core for power optimization. The resulting architecture requires extremely low power at very small footprint and provides reasonable execution speed.

2Previous Work

A straightforward method to implement integer and polynomial modular multiplications is to first compute the product of the two operands, and then to reduce the product using the modulus, Traditionally, the reduction step is implemented by a division operation, which is significantly more demanding than the initial multiplication. To alleviate the reduction problem in integer modular multiplications, Crandall proposed [3] using special primes, primes of the form where is a small integer constant. By using special primes, modular reduction turns into a multiplication operation by the small constant that, in many cases, may be performed by a series of less expensive shift and add operations:

It should be noticed that is not fully reduced. Depending on the length of a few more reductions are needed. The best possible choice for a special prime


is a Mersenne prime,		with	fixed to a word-boundary. In this case,
the reduction operation becomes a simple modular addition
Similarly primes of the form		may simplify reduction into a modular sub-
traction		Unfortunately, Mersenne primes and primes of the
form	are scarce. For degrees up to 1000 no primes of form			and only
the two Mersenne primes		and	exist. Moreover, these primes are

too large for ECDSA which utilizes bit-lengths in the range 160 – 350. Hence, a

TEAM LinG

94 E. Öztürk, B. Sunar, and

more practical choice is to use primes of the form For a constant larger than and a degree that is not aligned to a word boundary, some extra shifts and additions may be needed. To relax the restrictions, Solinas [11] introduced a generalization for special primes. His technique is based on signed bit recoding. While increasing the number of possible special primes, additional low-level operations are needed. The special modulus reduction technique introduced by Crandall [3] restricts the constant in to a small constant that fits into a single word.

3Modulus Scaling Techniques

The idea of modulus scaling was introduced by Walter [13]. In this work, the modulus was scaled to obtain a certain representation in the higher order bits, which helped the estimation of the quotient in Barrett’s reduction technique. The method works by scaling to the prime modulus to obtain a new modulus, Reducing an integer using the new modulus will produce a result that is congruent to the residue obtained by reducing modulo This follows from the fact that reduction is a repetitive subtraction of the modulus. Subtracting is equivalent to times subtracting and thus When a scaled modulus is used, residues will be in the range The number is not fully reduced and essentially we are using a redundant representation where an integer is represented using more bits than necessary. Consequently, it will be necessary that the final result is reduced by to obtain a fully reduced representation. Here we wish to use scaling to produce moduli of special form. If a random pattern appears in a modulus, it will not be possible to use the low-weight optimizations discussed in Section 2. However, by finding a suitable small constant it may be possible to scale the prime to obtain a new modulus of special form, that is either of low-weight or in a form that allows efficient recoding. To keep the redundancy minimal, the scaling factor must be small compared to the original modulus. Assuming a random modulus, such a small factor might be hard or even impossible to find. We concentrate again on primes of special forms. We present two heuristics that form a basis for efficient on-the-fly scaling:

Heuristic 1 If the base B representation of an integer contains a series of repeating digits, scaling the integer with the largest possible digit, produces a string of repeating zero digits in the scaled and recoded integer.

The justification of the heuristic is quite simple. Assume the representation of the modulus in base B contains a repeating digit of arbitrary value D. We use the constant scaling factor to compute When a string of repeating D-digits is multiplied with the scaling factor, and written in base B we obtain the following

The bar over the least significant digit denotes a negative valued digit.

TEAM LinG

Low-Power Elliptic Curve Cryptography

The presented scaling technique is simple, efficient, and only requires the modulus to have repeating digits. Since the scaling factor is fixed and only depends on the length of the repeating pattern – not its value –, a modulus with multiple repeating digits can be scaled properly at the cost of increasing the length of the modulus by a single digit. We present another heuristics for scaling, this technique is more efficient but more restrictive on the modulus.

Heuristic 2 Given a modulus containing repeating D-digits in base B representation, if B – 1 is divisible by the repeating digit, then the modulus can be efficiently scaled by the factor

As earlier the heuristic is verified by multiplying a string of repeating digits with the scaling factor and then by recoding.

We provide two examples for the heuristics in Appendix A. We have compiled a list of primes that when scaled with a small factor produce moduli of the form in Table 4 (see Appendix A). These primes provide a wide range of perfect choices for the implementation of cryptographic schemes.

4Scaled Modular Inversion

In this section we consider the application of scaled arithmetic to implement more efficient inversion operations. An efficient way of calculating multiplicative inverses is to use binary extended Euclidean based algorithms. The Montgomery inversion algorithm proposed by Kaliski [5] is one of the most efficient inversion algorithms for random primes. Montgomery inversion, however, is not suitable when used with scaled primes since it does not exploit our special moduli. Furthermore, it can be used only when Montgomery arithmetic is employed. Therefore, what we need is an algorithm that takes advantage of the proposed special moduli. Thomas et al. [12] proposed the Algorithm X for Mersenne primes of the form (see Appendix B).

Due to its simplicity Algorithm X is likely to yield an efficient hardware implementation. Another advantage of Algorithm X is the fact that the carry-free arithmetic can be employed. The main problem with other binary extended Euclidean algorithms is that they usually have a step involving comparison of two integers. The comparison in Algorithm X is much simpler and may be implemented easily using carry-free arithmetic.

The algorithm can be modified to support the other types of special moduli as

well. For instance, changing Step 4 of the algorithm to
will make the algorithm work for special moduli of the form	with almost

no penalty in the implementation. The only problem with a special modulus, is the fact that it is not prime (but multiple of a prime, and therefore

TEAM LinG

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 1011 / 4811 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.2013342.39 Кб5Jones D.M.The new C standard.An economic and cultural commentary.Sentence 766.pdf
#
23.08.2013394.15 Кб168Jones D.PCB design tutorial.2004.pdf
#
23.08.20131.75 Mб12Jones N.D.Partial evaluation and automatic program generation.1999.pdf
#
23.08.20135.36 Mб6Jones W.Beginning DirectX 9.2004.pdf
#
23.08.2013407.57 Кб19Jordan M.Computational aspects of motor control and motor learning.pdf
#
23.08.201319.28 Mб39Joye M.Cryptographic hardware and embedded systems.2005.pdf
#
23.08.201372.72 Кб12K-NET bus.pdf
#
23.08.2013117.29 Кб30Kahan W.Lecture notes on IEEE standard 754 for binary floating-point arithmetic.1997.pdf
#
23.08.20131.08 Mб20Kahn P.Principles of typography for user interface design.1997.pdf
#
23.08.2013452.99 Кб22Kaliski B.S.On the security of the RC5 encryption algorithm.1998.pdf
#
23.08.20134.85 Mб45Kammer D.Bluetooth application developer's guide.The short range interconnect solution.2002.pdf