FINITE FIELD MULTIPLICATION IN LAGRANGE
REPRESENTATION USING FAST FOURRIER TRANSFORM
Christophe Negre
´
Equipe DALI, LP2A, Universit
´
e de Perpignan
avenue P. Alduy, 66 000 Perpignan, France
Keywords:
Finite field arithmetic, FFT, multiplication.
Abstract:
The multiplication in F
p
n
can be performed using a polynomial version of Montgomery multiplication (Mont-
gomery, 1985). In (Bajard et al., 2003) Bajard et al. improved this method by using a Lagrange represen-
tation: the elements of F
p
n
are represented by their values at a fixed set of points. The costly operations in
this new algorithm are the two changes of Lagrange representation which require 2r
2
operations in F
p
with
n r 2
log
2
(n)
. In this paper we present a new method to perform the change of Lagrange represen-
tation. This method uses Fast Fourier Transform and has a cost equal to 3rlog
2
(r) operations in F
p
with
r = 2
log
2
(n)
.
INTRODUCTION
Finite field arithmetic is used in many applications,
for example in cryptography (Koblitz, 1987; Miller,
1986) and in error correcting code (Berlekamp, 1982).
Getting efficient finite field arithmetic is an important
challenge to make these applications as fast as pos-
sible. Specially in ECC essentially two field opera-
tions are required: multiplication and addition, and
the multiplication is the most costly. Consequently,
many works have been done to get an efficient mul-
tiplication in finite field F
2
n
(Ning and Yin, 2001;
Sunar and Koc, 1999), F
p
and also for F
p
n
(Bailey
and Paar, 1998).
Our interest here concerns finite field extensions
F
p
n
. Elements in these field can be seen as polyno-
mial A(X) F
p
[X] of degree n 1. The multiplica-
tion consist in a multiplication of polynomial follow-
ing by a reduction modulo an irreducible polynomial
N(X) F
p
[X].
A method to perform arithmetic in F
p
n
was pro-
posed by Bajard et al. in (Bajard et al., 2003). This
method is based on a generalization of the polynomial
version of the Montgomery Algorithm (Montgomery,
1985). The arithmetic is done modulo friable poly-
nomials Ψ and Ψ
with degree r n; by friable we
means they are product of polynomial of degree one.
The arithmetic modulo such polynomials is very effi-
cient when the elements are expressed in the Lagrange
representation (Bajard et al., 2003) LR
Ψ
and LR
Ψ
.
In this situtation the most costly operation is the
change of representation from LR
Ψ
to LR
Ψ
which
consists of a product of an r × r constant matrix with
the vector LR
Ψ
(A). This requires roughly r
2
multi-
plications and r
2
additions. Here we will improve this
method for well chosen Ψ and Ψ
and by using Fast
Fourier Transform. The change of representation will
require 4r log(r) multiplications and 8r log(r) addi-
tions .
This article is organized as follows: in the first
section we will give some background on Lagrange
representation of polynomials. In the section 2 we
explain how the Lagrange algorithm (Bajard et al.,
2003) of Bajard et al. works. The main contribution
of the paper is in section 3 in which we present our
new method to perform the change of Lagrange rep-
resentation. At the end we evalutate the complexity of
our algorithm and compare it to the original method
of Bajard et al. and we conclude by a brief conclu-
sion.
1 LAGRANGE
REPRESENTATION
The Lagrange representation consists to represent a
polynomial by its evalutation at n points. In an arith-
320
Negre C. (2006).
FINITE FIELD MULTIPLICATION IN LAGRANGE REPRESENTATION USING FAST FOURRIER TRANSFORM.
In Proceedings of the International Conference on Security and Cryptography, pages 320-324
DOI: 10.5220/0002096503200324
Copyright
c
SciTePress
metic point of view, this is related to the Chinese Re-
mainder Theorem which asserts that the following ap-
plication is an isomorphism.
F
p
[X]/(Ψ) F
p
[X]/(X e
1
) × · · · × F
p
[X]/(X e
k
)
A 7→ (A mod (X e
1
), . . . , A mod (X e
k
)) ,
We remark that the computation of A mod (X
e
i
) is simply the computation of A(e
i
). In other words
the image of A(X) by the isomorphism (1) is nothing
else that the multi-points evaluation of A at the roots
of Ψ =
Q
n
i=1
(X e
i
).
Definition 1 (Lagrange representation) Let A
F
p
[X] with deg A < n, and Ψ =
Q
r
i=1
(X e
i
),
where e
i
F
p
for 1 i r and e
i
6= e
j
for i 6= j. If
a
i
= A(e
i
) for 1 i r, the Lagrange representa-
tion (LR) of A(X) modulo Ψ is defined by
LR
Ψ
(A(X)) = (a
1
, . . . , a
r
). (1)
The advantage of the LR representation to perform
operations modulo Ψ is a consequence of the Chinese
remainder theorem. Specially the arithmetic modulo
Ψ in classical polynomial representation can be costly
if Ψ has a high degree, in LR representation this arith-
metic is decomposed into n independent arithmetic
units, each consists of arithmetic modulo a very sim-
ple polynomial (X e
i
). But arithmetic modulo
(X e
i
) is the arithmetic modulo p since the prod-
uct of two degree zero polynomials is just the product
modulo p of the two constant coefficients.
2 MONGOMERY
MULTIPLICATION IN
LANGRANGE
REPRESENTATION
Montgomery in (Montgomery, 1985) proposed an al-
gorithm to perform integer modular multiplications.
The polynomial version of the Montgomery algo-
rithm can be used to perform polynomial modular
arithmetic in F
p
[X]. Here we will present only the
generalized version of this algorithm.
We consider an irreducible polynomial N F
p
[X]
of degree n and two polynomials Ψ, Ψ
F
p
[X] such
that
gcd(Ψ, Ψ
) = gcd(Ψ, N) = gcd(Ψ
, N) = 1,
and deg Ψ, deg Ψ
n. In this situation, the gener-
alized polynomial version of Montgomery Algorithm
computes ABΨ
1
mod N .
The advantage of Montgomery multiplication is to
avoid euclidean division to compute AB modulo N.
This division is replaced by exactly 5 multiplications
Algorithm 1 Generalized Montgomery Multiplica-
tion.
Require: A, B F
p
[X], with deg A, deg B n1;
a monic irreducible polynomial N F
p
[X], with
deg N = n; Ψ, Ψ
, with deg Ψ = deg Ψ
= r
n, and gcd(Ψ, Ψ
) = gcd(Ψ, N) = 1
Ensure: ABΨ
1
mod N
1: Q A × B × N
1
mod Ψ
2: R (A × B + Q × N ) × Ψ
1
mod Ψ
modulo Ψ or Ψ
(2 in step 1 and 3 in step 2). But this
makes sense only if the arithmetic operation modulo
Ψ and Ψ
can be done efficiently. In their original
paper Bajard et al. proposed to use Ψ and Ψ
which
split totally in F
p
[X], i.e.,
Ψ =
r
Y
i=1
(X e
i
) and Ψ
=
r
Y
i=1
(X e
i
).
In section 1 we noticed that the arithmetic modulo
polynomials Ψ and Ψ
can be done efficiently using
Lagrange representation. The use Lagrange represen-
tation LR
Ψ
and LR
Ψ
provides some complications
in the Generalized Montgomery Algorithm: between
step 1 and step 2 we have to perform some conver-
sions from Lagrange representation of Q relatively to
Ψ to Lagrange representation of Q relatively to Ψ
.
Similarly after the step 2 we should compute the La-
grange representation of R relatively to Ψ.
If we note Change
Rep
ΨΨ
the sub-routine
which computes the Lagrange representation of an el-
ement A relatively to Ψ
from its Lagrange represen-
tation relatively to Ψ, the generalized Montgomery’s
Algorithm 1 becomes
Algorithm 2 LR Modular Multiplication.
Require: A, B F
p
[X], with deg A, deg B k1;
a monic irreducible polynomial N F
p
[X], with
deg N = n; Ψ, Ψ
, with deg Ψ = deg Ψ
= r,
and gcd(Ψ, Ψ
) = gcd(Ψ, N) = 1
1: LR
Ψ
(Q) LR
Ψ
(A) × LR
Ψ
(B) × LR
Ψ
(N)
1
2: LR
Ψ
(Q) Change
Rep
Ψ,Ψ
(LR
Ψ
(Q))
3: LR
Ψ
(R) (LR
Ψ
(A) × LR
Ψ
(B)
LR
P si
(Q) × LR
Ψ
(N)) × LR
Ψ
(Ψ)
1
4: LR
Ψ
(R) Change
Rep
Ψ
,Ψ
(LR
Ψ
(R)
In (Bajard et al., 2003) Bajard et al. the change
of Lagrange representation is done with a product
matrix-vector · LR
Ψ
(Q) where the matrix =
[ω
i,j
]
i,j=1,...,r
is a r × r matrix and has the follow-
ing coefficients
ω
i,j
=
r
Y
=1
e
i
e
e
j
e
FINITE FIELD MULTIPLICATION IN LAGRANGE REPRESENTATION USING FAST FOURRIER TRANSFORM
321
This matrix-vector product is a costly operation: it
requires r
2
multiplications and r(r 1) additions.
In this paper we study a different strategy to per-
form the change of Lagrange representation. Our
strategy was to express the change of representation
in term of Discrete Fourrier Transform. This becomes
interisting when this DFT can be computed with the
FFT, since the FFT is really efficient.
3 CHANGE OF LAGRANGE
REPRESENTATION WITH FFT
For now, we will take the prime integer p such that
2
k+1
|p 1 with k 1. In this case there exists
an element α F
p
such that the 2
k+1
elements α
i
for i = 0, . . . , 2
k+1
1 are the 2
k
distinct roots of
X
2
k+1
1.
Let r = 2
k
, in this situation we choose the two
polynomials Ψ and Ψ
as follows
Ψ =
Q
r
i=0
(X α
2i
) = X
r
1
and Ψ
=
Q
r
i=0
(X α
2i+1
) = X
r
+ 1.
We are going to express the change between two
Lagrange representations LR
Ψ
and LR
Ψ
in term of
to Discrete Fourier Transform. But before we recall
some basic fact on DFT and FFT.
3.1 Background On Dft and FFT
Let β = α
2
, the evaluation of a polynomial A(X)
F
p
[X] at the elements β
i
for i = 0, . . . , r corresponds
to the Discrete Fourier Transformation of A
DF T
r
(A) = (ˆa
1
, ˆa
2
, . . . , ˆa
r
)
= (A(1), A(β), A(β
2
), . . . , A(β
r1
))
The DF T
r
of A can be computed by using the
F F T
r
Algorithm. This algorithm is based on the fol-
lowing identities
ˆa
i
= A
e
((β
2
)
i
) + β
i
A
o
((β
2
)
i
)
ˆa
i+(r/2)
= A
e
((β
2
)
i
) β
i
A
o
((β
2
)
i
),
(2)
where the polynomial A
e
(X) is the even part of A
and A
o
(X) is the odd part of A
A
e
(X) =
r/2
X
i=0
a
2i
X
i
, A
o
(X) =
r/2
X
i=0
a
2i+1
X
i
.
The F F T
r
recursively computes F F T
r/2
(A
e
)
and F F T
r/2
(A
o
) and then deduces F F T
r
(A) =
(A(β
0
)A(β
2
), . . . , A(β
(r1)
)) using equations (2).
For a complete description of the FFT algorithm
see (von zur Gathen and Gerhard, 1999)
In our situation the FFT is a powerful algorithm
which enables us
to compute the Lagrange representation relatively
to Ψ of a polynomial A(X).
to compute the polynomial from its LR representa-
tions relatively to Ψ since the reverse operation of
the FFT is F F T
1
r
=
1
r
F F T
r
.
3.2 The Change
Rep Routines
Let us see how use the FFT in the change of repre-
sentation between LR
Ψ
and LR
Ψ
. We know only
how to reconstruct a polynomial A(X) from its LR
Ψ
representation
A(X) = F F T
1
r
(LR
Ψ
(A)).
But if now we compute
e
A(X) = A(αX) and if after
that we compute F F T
r
(
e
A), we obtain the r = 2
k
elements
e
A(β
i
) = A(αα
2i
) = A(α
2i+1
),
for i = 0, . . . , r. This means that LR
Ψ
(A) =
F F T
r
(
e
A)). Consequently the change of Lagrange
representation Change
Rep
ΨΨ
can be done with
the following algorithm.
Algorithm 3 Change Rep
ΨΨ
.
1: A(X) F F T
1
(LR
Ψ
(A))
2:
e
A(X) A(αX)
3: LR
Ψ
(A) F F T (
e
A)
The reverse of the change of representation is done
using the reverse process. First we get back to
e
A(X)
by computing F F T
1
r
(LR
Ψ
(A), then we compute
to A(X) with A(X) =
e
A(α
1
X), and finally we get
LR
Ψ
(A) by computing LR
Ψ
(A) = F F T
r
(A(X)).
Algorithm 4 Change Rep
Ψ
Ψ
.
1:
e
A(X) F F T
1
(LR
Ψ
(A))
2: A(X)
e
A(α
1
X)
3: LR
Ψ
(A) F F T (A(X))
It is natural to wonder if it is possible to merge
the three step of the Algorithm 3 and 4 in a FFT-like
recursive Algorithm since the main computation are
done with F F T
r
and F F T
1
r
. Specially the com-
putation of the polynomial representation of A(X)
and
e
A(X) seems to be superfluous, it thus could be
avoided. For now we could not obtain such recursive
algorithm, but it will be interesting in the future to get
such recursive method to improve the performance of
the Algorithm 3 and 4.
SECRYPT 2006 - INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY
322
4 COMPLEXITY COMPARISON
In this section we evalutate the complexity of the
modified form of Lagrange multiplication of Bajard et
al.. First we focus on the theoretic complexity by
evaluating the number of field operations F
p
(addi-
tions and multiplications). In the table 4 we give
the cost of each step of the Algorithm 2 used with
Change
Rep of section 3. For an explicit evaluation
of the cost of the FFT we refer to (von zur Gathen and
Gerhard, 1999).
Multiplication Additions
Step 1 2r 0
Step 2 rlog
2
(r) 2rlog
2
(r)
Step 3 3r r
Step 4 (2rlog
2
(r) + r 1) 4rlog
2
(r)
The global cost of the algorithm is thus equal to
(4rlog
2
(r)+7r2) multiplications and (8rlog
2
(r)+
r) additions. We get a clearly improvement compared
to original LR multiplication, since the complexity
was equal to (2r(r 1) + r) addtions and 2r
2
+ 5r
multiplications in F
p
.
Hardware implementation. For hardware imple-
mentations, the FFT have the nice property to be
parallelizable. Specially, we can compute F F T
r
(A)
with r/2 multipliers and r adders in parallel. The de-
lay of the architecture to perform one F F T
r
is then
equal to log
2
(r)T
M
+log
2
(r)T
A
. For a precise expla-
nation of this fact we refer to (Johnson et al., 2000) .
But the other computations in the Algorithm 2 re-
quire also at most only r multipliers in parallel and r
adders in parallel. This is clear for step 1 and step 3,
for step 2 and 4, we need r adders and multipliers for
the FFT parts, and also r multipliers for the compu-
tation of
e
Q and R(X). We can use at each time the
same r adders and multipliers.
Consequently the Algorithm 2 can be implemented
in hardware with an architecture using r multipliers
and r adders in parallel.
Let us evaluate the delay of such architecture. The
total delay is equal to the sum of the delay of each
step of Algorithm 2. If we note T
M
the time for a
multiplication in F
p
, the step 1 has a delay of 2T
M
and the step 2 has a delay of 3T
M
+ T
A
. In step 2
and 4 the delay is equal to the delay of two F F T
r
plus the delay of two multiplications ,i.e. (one T
M
for the multiplications by r
1
and a second for the
computation of
e
A), each step has a delay of (log
2
(r)+
2)T
M
+ (log
2
(r)T
A
.
Finally, the global delay is equal to (log
2
(r) +
5)T
M
+ (log
2
(r) + 1)T
A
.
5 CONCLUSION
In this paper we have presented a modified version of
the Algorithm of Bajard et al. (Bajard et al., 2003)
computes the product of two elements of F
p
n
. We
have modified only a part of the algorihtm: precisely
we modified the changes of representation in a way
to use FFT algorithm. We thus obtain an algorithm
which have a sub-quadratic complexity: a multipli-
cations requires (4rlog
2
(r) + 7r 2) multiplica-
tions and (8rlog
2
(r) + r) additions in F
p
instead of
(2r(r1)+r) addtions and 2r
2
+5r multiplications in
the original work of Bajard et al. (Bajard et al., 2003).
We are greatefull to N. Louvet for helpfull com-
ments on a preliminary version of this paper.
REFERENCES
Bailey, D. and Paar, C. (1998). Optimal Extension Fields
for Fast Arithmetic in Public-Key Algorithms. Lecture
Notes in Computer Science, 1462:472.
Bajard, J.-C., Imbert, L., Negre, C., and Plantard, T. (2003).
Efficient Multiplication in GF(p
k
) for Elliptic Curve
Cryptography. In ARITH 16, 16th IEEE Symposium
on Computer Arithmetic June 15-18, 2003 Santiago
de Compostela, SPAIN.
Berlekamp, E. (1982). Bit-serial Reed-Solomon encoder.
IEEE Transaction on Information Theory, IT-28(6).
Johnson, J., Kumhom, P., and Nagvajara, P. (2000). De-
sign, optimization, and implementation of a universal
fft processor. In 13th IEEE International ASIC/SOC
Conference, Washington, DC.
Koblitz, N. (1987). Elliptic curve cryptosystems. Mathe-
matics of Computation, 48(177):203–209.
Miller, V. (1986). Use of elliptic curves in cryptography.
Advances in Cryptology, proceeding’s of CRYPTO’85,
218:417–426.
Montgomery, P. (1985). Modular multiplication without
trial division. Mathematic of computation, 44(170).
Ning, P. and Yin, Y. (2001). Efficient Software Implemen-
tation for Finite Field Multiplication in Normal Basis.
Lecture Notes in Computer Science, 2229:177.
Sunar, B. and Koc, C. (1999). Mastrovito Multiplier for All
Trinomials. IEEE Transaction on Computers.
von zur Gathen, J. and Gerhard, J. (1999). Modern com-
puter algebra. Cambridge University Press, New
York, NY, USA.
FINITE FIELD MULTIPLICATION IN LAGRANGE REPRESENTATION USING FAST FOURRIER TRANSFORM
323