An Iterative Montgomery Modular Multiplication Algorithm With Low Area-Time Product

This paper presents a highly efficient iterative Montgomery modular multiplication algorithm, wherein the computations of quotient and intermediate result in each iteration are done in parallel. This parallelism breaks the data dependency and thus reduces the computation latency. Moreover, this pape...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computers Vol. 72; no. 1; pp. 236 - 249
Main Authors Zhang, Bo, Cheng, Zeming, Pedram, Massoud
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents a highly efficient iterative Montgomery modular multiplication algorithm, wherein the computations of quotient and intermediate result in each iteration are done in parallel. This parallelism breaks the data dependency and thus reduces the computation latency. Moreover, this paper replaces required multiplications and additions in each iteration with compressions and encoding, thereby achieving a computation latency of order <inline-formula><tex-math notation="LaTeX">d+6</tex-math> <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>6</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq1-3154164.gif"/> </inline-formula> where <inline-formula><tex-math notation="LaTeX">d=\left\lceil N/m \right\rceil +2</tex-math> <mml:math><mml:mrow><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mfenced separators="" open="⌈" close="⌉"><mml:mi>N</mml:mi><mml:mo>/</mml:mo><mml:mi>m</mml:mi></mml:mfenced><mml:mo>+</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq2-3154164.gif"/> </inline-formula> is the number of iterations, <inline-formula><tex-math notation="LaTeX">N</tex-math> <mml:math><mml:mi>N</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq3-3154164.gif"/> </inline-formula> denotes the bitwidth of modulus <inline-formula><tex-math notation="LaTeX">M</tex-math> <mml:math><mml:mi>M</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq4-3154164.gif"/> </inline-formula>, and <inline-formula><tex-math notation="LaTeX">m</tex-math> <mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="zhang-ieq5-3154164.gif"/> </inline-formula> is the number of bits of the multiplier that are processed in each iteration of the algorithm. Hardware realization of the proposed Montgomery modular multiplication on a Xilinx Virtex-7 FPGA device shows <inline-formula><tex-math notation="LaTeX">> 41\%</tex-math> <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>41</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq6-3154164.gif"/> </inline-formula> computation latency saving and <inline-formula><tex-math notation="LaTeX">>31\%</tex-math> <mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>31</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq7-3154164.gif"/> </inline-formula> area saving when <inline-formula><tex-math notation="LaTeX">N=1,024</tex-math> <mml:math><mml:mrow><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>024</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq8-3154164.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">m=8</tex-math> <mml:math><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>8</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhang-ieq9-3154164.gif"/> </inline-formula>, compared with the best of previous state-of-art references. These savings amount to more than 63% reduction in terms of the area-latency product metric.
ISSN:0018-9340
1557-9956
DOI:10.1109/TC.2022.3154164