An Iterative Montgomery Modular Multiplication Algorithm With Low Area-Time Product

This paper presents a highly efficient iterative Montgomery modular multiplication algorithm, wherein the computations of quotient and intermediate result in each iteration are done in parallel. This parallelism breaks the data dependency and thus reduces the computation latency. Moreover, this pape...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on computers Vol. 72; no. 1; pp. 236 - 249
Main Authors	Zhang, Bo, Cheng, Zeming, Pedram, Massoud
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adders Algorithms Cryptosystem Delays Elliptic curve cryptography Encoding Hardware Iterative methods large integer arithmetic Logic gates Modular equipment modular multiplication montgomery modular multiplication Parallel processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents a highly efficient iterative Montgomery modular multiplication algorithm, wherein the computations of quotient and intermediate result in each iteration are done in parallel. This parallelism breaks the data dependency and thus reduces the computation latency. Moreover, this paper replaces required multiplications and additions in each iteration with compressions and encoding, thereby achieving a computation latency of order <inline-formula><tex-math notation="LaTeX">d+6</tex-math> <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>6</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq1-3154164.gif"/> </inline-formula> where <inline-formula><tex-math notation="LaTeX">d=\left\lceil N/m \right\rceil +2</tex-math> <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mfenced separators="" open="⌈" close="⌉"><mml:mi>N</mml:mi><mml:mo>/</mml:mo><mml:mi>m</mml:mi></mml:mfenced><mml:mo>+</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq2-3154164.gif"/> </inline-formula> is the number of iterations, <inline-formula><tex-math notation="LaTeX">N</tex-math> <mml:math><mml:mi>N</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq3-3154164.gif"/> </inline-formula> denotes the bitwidth of modulus <inline-formula><tex-math notation="LaTeX">M</tex-math> <mml:math><mml:mi>M</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq4-3154164.gif"/> </inline-formula>, and <inline-formula><tex-math notation="LaTeX">m</tex-math> <mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq5-3154164.gif"/> </inline-formula> is the number of bits of the multiplier that are processed in each iteration of the algorithm. Hardware realization of the proposed Montgomery modular multiplication on a Xilinx Virtex-7 FPGA device shows <inline-formula><tex-math notation="LaTeX">> 41\%</tex-math> <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>41</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq6-3154164.gif"/> </inline-formula> computation latency saving and <inline-formula><tex-math notation="LaTeX">>31\%</tex-math> <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>31</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq7-3154164.gif"/> </inline-formula> area saving when <inline-formula><tex-math notation="LaTeX">N=1,024</tex-math> <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq8-3154164.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">m=8</tex-math> <mml:math><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq9-3154164.gif"/> </inline-formula>, compared with the best of previous state-of-art references. These savings amount to more than 63% reduction in terms of the area-latency product metric.
ISSN:	0018-9340 1557-9956
DOI:	10.1109/TC.2022.3154164