An Iterative Montgomery Modular Multiplication Algorithm With Low Area-Time Product
This paper presents a highly efficient iterative Montgomery modular multiplication algorithm, wherein the computations of quotient and intermediate result in each iteration are done in parallel. This parallelism breaks the data dependency and thus reduces the computation latency. Moreover, this pape...
Saved in:
Published in | IEEE transactions on computers Vol. 72; no. 1; pp. 236 - 249 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents a highly efficient iterative Montgomery modular multiplication algorithm, wherein the computations of quotient and intermediate result in each iteration are done in parallel. This parallelism breaks the data dependency and thus reduces the computation latency. Moreover, this paper replaces required multiplications and additions in each iteration with compressions and encoding, thereby achieving a computation latency of order <inline-formula><tex-math notation="LaTeX">d+6</tex-math> <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>6</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq1-3154164.gif"/> </inline-formula> where <inline-formula><tex-math notation="LaTeX">d=\left\lceil N/m \right\rceil +2</tex-math> <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mfenced separators="" open="⌈" close="⌉"><mml:mi>N</mml:mi><mml:mo>/</mml:mo><mml:mi>m</mml:mi></mml:mfenced><mml:mo>+</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq2-3154164.gif"/> </inline-formula> is the number of iterations, <inline-formula><tex-math notation="LaTeX">N</tex-math> <mml:math><mml:mi>N</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq3-3154164.gif"/> </inline-formula> denotes the bitwidth of modulus <inline-formula><tex-math notation="LaTeX">M</tex-math> <mml:math><mml:mi>M</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq4-3154164.gif"/> </inline-formula>, and <inline-formula><tex-math notation="LaTeX">m</tex-math> <mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq5-3154164.gif"/> </inline-formula> is the number of bits of the multiplier that are processed in each iteration of the algorithm. Hardware realization of the proposed Montgomery modular multiplication on a Xilinx Virtex-7 FPGA device shows <inline-formula><tex-math notation="LaTeX">> 41\%</tex-math> <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>41</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq6-3154164.gif"/> </inline-formula> computation latency saving and <inline-formula><tex-math notation="LaTeX">>31\%</tex-math> <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>31</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq7-3154164.gif"/> </inline-formula> area saving when <inline-formula><tex-math notation="LaTeX">N=1,024</tex-math> <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq8-3154164.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">m=8</tex-math> <mml:math><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq9-3154164.gif"/> </inline-formula>, compared with the best of previous state-of-art references. These savings amount to more than 63% reduction in terms of the area-latency product metric. |
---|---|
ISSN: | 0018-9340 1557-9956 |
DOI: | 10.1109/TC.2022.3154164 |