Optimization of a novel programmable data-flow crypto processor using NSGA-II algorithm

[Display omitted] The optimization of a novel programmable data-flow crypto processor dedicated to security applications is considered. An architecture based on assigning basic functional units to four synchronous regions was proposed in a previous work. In this paper, the problem of selecting the n...

Full description

Saved in:
Bibliographic Details
Published inJournal of advanced research Vol. 12; pp. 67 - 78
Main Authors El-Hadidi, Mahmoud T., Elsayed, Hany M., Osama, Karim, Bakr, Mohamed, Aslan, Heba K.
Format Journal Article
LanguageEnglish
Published Egypt Elsevier B.V 01.07.2018
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] The optimization of a novel programmable data-flow crypto processor dedicated to security applications is considered. An architecture based on assigning basic functional units to four synchronous regions was proposed in a previous work. In this paper, the problem of selecting the number of synchronous regions and the distribution of functional units among these regions is formulated as a combinatorial multi-objective optimization problem. The objective functions are chosen as: the implementation area, the execution delay, and the consumed energy when running the well-known AES algorithm. To solve this problem, a modified version of the Genetic Algorithm - known as NSGA-II - linked to a component database and a processor emulator, has been invoked. It is found that the performance improvement introduced by operating the processor regions at different clocks is offset by the necessary delay introduced by wrappers needed to communicate between the asynchronous regions. With a two clock-periods delay, the minimum processor delay of the asynchronous case is 311% of the delay obtained in the synchronous case, and the minimum consumed energy is 308% more in the asynchronous design when compared to its synchronous counterpart. This research also identifies the Instruction Region as the main design bottleneck. For the synchronous case, the Pareto front contains solutions with 4 regions that minimize delay and solutions with 7 regions that minimize area or energy. A minimum-delay design is selected for hardware implementation, and the FPGA version of the optimized processor is tested and correct operation is verified for AES and RC6 encryption/decryption algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2090-1232
2090-1224
DOI:10.1016/j.jare.2017.11.002