Fast RSA decryption through high-radix scalable Montgomery modular multipliers

This paper improves the quotient-pipelined high radix scalable Montgomery modular multiplier by processing w-bit and k-bit words in carry save form instead of some (w + k)-bit length operands. It directly reduces both the critical path and the area overhead of the original processing elements. Then...

Full description

Saved in:
Bibliographic Details
Published inScience China. Information sciences Vol. 58; no. 6; pp. 132 - 147
Main Authors Wu, Tao, Li, ShuGuo, Liu, LiTian
Format Journal Article
LanguageEnglish
Published Beijing Science China Press 01.06.2015
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper improves the quotient-pipelined high radix scalable Montgomery modular multiplier by processing w-bit and k-bit words in carry save form instead of some (w + k)-bit length operands. It directly reduces both the critical path and the area overhead of the original processing elements. Then based on this improved high-radix scalable Montgomery modular multiplier, we propose an efficient hardware architecture for RSA decryption with Chinese Remainder Theorem. With simple configuration logics, the hardware unit works in three modes: (1) scalable modular reduction for preeomputation, (2) scalable Montgomery modular multiplication for modular exponentiation. where an approximation method is developed to reduce the expanded result below the modulus, and (3) scalable multiplication for post-processing. Hardware implementation shows that the proposed architecture is optimal with reference to the literature in terms of speed, area, and frequency. A 4096-bit RSA decryption in XC2V6000-6 FPGA can be completed in 11.05 ms with 14041 slices/17409 LUTs, 128 16 × 16 multipliers, and 70 kbits of block RAMs. Finally, by the use of Montogmery powering ladder the modular exponentiation unit based on the improved high radix scalable Montgomery modular multiplier can be built resistant to fault and simple power attacks. A 1024-bit modular exponentiation unit with such resistances costs about 255K NAND2 gates in .18 btm CMOS process, and one full modular exponentiation takes about 1.44 ms at 250 MHz.
Bibliography:RSA, high radix, scalable, Montgomery modular multiplication, CRT
11-5847/TP
This paper improves the quotient-pipelined high radix scalable Montgomery modular multiplier by processing w-bit and k-bit words in carry save form instead of some (w + k)-bit length operands. It directly reduces both the critical path and the area overhead of the original processing elements. Then based on this improved high-radix scalable Montgomery modular multiplier, we propose an efficient hardware architecture for RSA decryption with Chinese Remainder Theorem. With simple configuration logics, the hardware unit works in three modes: (1) scalable modular reduction for preeomputation, (2) scalable Montgomery modular multiplication for modular exponentiation. where an approximation method is developed to reduce the expanded result below the modulus, and (3) scalable multiplication for post-processing. Hardware implementation shows that the proposed architecture is optimal with reference to the literature in terms of speed, area, and frequency. A 4096-bit RSA decryption in XC2V6000-6 FPGA can be completed in 11.05 ms with 14041 slices/17409 LUTs, 128 16 × 16 multipliers, and 70 kbits of block RAMs. Finally, by the use of Montogmery powering ladder the modular exponentiation unit based on the improved high radix scalable Montgomery modular multiplier can be built resistant to fault and simple power attacks. A 1024-bit modular exponentiation unit with such resistances costs about 255K NAND2 gates in .18 btm CMOS process, and one full modular exponentiation takes about 1.44 ms at 250 MHz.
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1674-733X
1869-1919
DOI:10.1007/s11432-014-5215-4