Optimizing Asynchronous Extern Execution in Programmable Software Data Planes

P4 has gained a significant attention as a programming language for describing target-independent packet processing. It supports a diverse hardware and software targets through various architecture models that declare external functions called externs representing target-specific functionalities. Th...

Full description

Saved in:
Bibliographic Details
Published inGLOBECOM 2023 - 2023 IEEE Global Communications Conference pp. 3819 - 3824
Main Authors Hudoba, Peter, Kitlei, Robert, Laki, Sandor, Voros, Peter
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:P4 has gained a significant attention as a programming language for describing target-independent packet processing. It supports a diverse hardware and software targets through various architecture models that declare external functions called externs representing target-specific functionalities. These externs may require specific or dedicated resources (e.g., cryptography co-processor, FPGA, etc.) for increased processing speed. In order to process tasks in bulk mode, packet processing may need to be temporarily suspended, while waiting for the function to return. Linear execution of the packet processing pipeline implemented by most P4 software targets cannot efficiently handle such situations. Asynchronous packet processing has been proposed to solve this issue by enabling to serve incoming packets while others are processed by an extern. In this paper, we explore existing approaches for extern execution in software data planes and propose a new lightweight asynchronous method for offloading extern execution to dedicated resources, such as cryptography coprocessors, which perform the extern computations. Our analysis show that the propose method can significantly improve the performance of extern execution in various use cases like IPsec, simple encryption and other small tasks and has negligible overhead compared to a prior solution. We also demonstrate that our method has clear benefits on constrained hardware (PcEngines APU single board computer) where the overhead of extern execution has bigger impact on the overall performance and prior approaches are not practical.
ISSN:2576-6813
DOI:10.1109/GLOBECOM54140.2023.10436961