Fuzzing MLIR Compilers with Custom Mutation Synthesis

Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 217 - 229
Main Authors	Limpanukorn, Ben, Wang, Jiyuan, Kang, Hong Jin, Zhou, Zitong, Kim, Miryung
Format	Conference Proceeding
Language	English
Published	IEEE 26.04.2025
Subjects	code patterns Codes compiler testing Fuzzing Generators Grammar-based fuzzing Hardware acceleration MLIR Program processors program synthesis program transformation Shape Software engineering Tensors Testing Writing
Online Access	Get full text

Cover

Loading…

Abstract	Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code.
AbstractList	Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code.
Author	Limpanukorn, Ben Kim, Miryung Zhou, Zitong Kang, Hong Jin Wang, Jiyuan
Author_xml	– sequence: 1 givenname: Ben surname: Limpanukorn fullname: Limpanukorn, Ben email: blimpan@cs.ucla.edu organization: University of California,Los Angeles – sequence: 2 givenname: Jiyuan surname: Wang fullname: Wang, Jiyuan email: wangjiyuan@cs.ucla.edu organization: University of California,Los Angeles – sequence: 3 givenname: Hong Jin surname: Kang fullname: Kang, Hong Jin email: hjkang@cs.ucla.edu organization: University of California,Los Angeles – sequence: 4 givenname: Zitong surname: Zhou fullname: Zhou, Zitong email: zitongzhou@cs.ucla.edu organization: University of California,Los Angeles – sequence: 5 givenname: Miryung surname: Kim fullname: Kim, Miryung email: miryung@cs.ucla.edu organization: University of California,Los Angeles
BookMark	eNotj8tKw0AUQEdRsK39gy7mBxLvnVdylxJaDaQItvsySSZ2pElKJkHar7egq7M5HDhz9tD1nWNshRAjAr3k2W6ttVRJLEDoGABkcseWlFAqJWrQhvCezVDrNEIh9BObh_B904wimjG9ma5X333xbZF_8qxvz_7khsB__Hjk2RTGvuXbabSj7zu-u3Tj0QUfntljY0_BLf-5YPvNep-9R8XHW569FpEnOUalSK1AWScNgUKoyTpblnUDqa4aoIqs0WSEQ1s1llA0KJ0Bp5RTwoAxcsFWf1nvnDucB9_a4XK4bQsihfIXofJHvw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICSE55347.2025.00037
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798331505691
EISSN	1558-1225
EndPage	229
ExternalDocumentID	11029941
Genre	orig-research
GrantInformation_xml	– fundername: National Science Foundation grantid: 2106838,1764077,1956322,2106404 funderid: 10.13039/100000001
GroupedDBID	-~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO
ID	FETCH-LOGICAL-i93t-b28a213d7f90410d9aeabbdf085cf09c9a65962e1acfa912f13e60e44e4260663
IEDL.DBID	RIE
IngestDate	Wed Aug 27 01:40:12 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i93t-b28a213d7f90410d9aeabbdf085cf09c9a65962e1acfa912f13e60e44e4260663
PageCount	13
ParticipantIDs	ieee_primary_11029941
PublicationCentury	2000
PublicationDate	2025-April-26
PublicationDateYYYYMMDD	2025-04-26
PublicationDate_xml	– month: 04 year: 2025 text: 2025-April-26 day: 26
PublicationDecade	2020
PublicationTitle	Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev	ICSE
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0006499
Score	2.2896361
Snippet	Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level...
SourceID	ieee
SourceType	Publisher
StartPage	217
SubjectTerms	code patterns Codes compiler testing Fuzzing Generators Grammar-based fuzzing Hardware acceleration MLIR Program processors program synthesis program transformation Shape Software engineering Tensors Testing Writing
Title	Fuzzing MLIR Compilers with Custom Mutation Synthesis
URI	https://ieeexplore.ieee.org/document/11029941
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LawIxEA6tp57sw9I3OfQa3WSzcXMWRUuVUi14k-xmAlK6Ft096K_vZHe1tFDoLSSBhITMfDOTb4aQR-V0VwNETCEmYlJYyUwSW2YgVD4XiYyd90OOJ2r4Jp_m0bwmq5dcGAAoP59B2zfLWL5dpYV3lXVQVaH09DT1Y7TcKrLWQewqxO41N44HujPqTftRFMou2oDC-02C8GcFlVKBDJpksl-6-jfy3i7ypJ3ufmVl_PfeTknrm6tHXw5a6IwcQXZOmvtiDbR-uxckGhS7Hc6g4-fRK_XjKA_WG-odsbRXIAb8oOOiCszT6TZDYLhZblpkNujPekNW10xgSx3mLBGxETy0XacDyQOrDZgksQ6BVeoCnWqjfLkd4CZ1RnPheAgqACnBZ6pH9HFJGtkqgytCA2eFi2IeOUgRZbnEpLHQBi0aNPli665Jy5_C4rPKirHYH8DNH_235MTfhI_ECHVHGvm6gHtU6HnyUF7kF27Kn9o
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA4yD3ry18Tf5uC1W9MmWXMeG5uuQ9yE3UbavMAQO9nag_vrfWm7iYLgrbSBlIS873vv5XuPkAdpVUcBCE8iJ_J4YLink8h4GkLpapHwyLo4ZDyWg1f-OBOzWqxeamEAoLx8Bi33WObyzTItXKisjVCF1tPJ1PcR-AWr5Fo7wyuRvdfqOOar9rA76QkR8g56gYGLnPjhzx4qJYT0j8h4O3l1c-StVeRJK938qsv47787Js1vtR593uHQCdmD7JQcbds10Pr0nhHRLzYbHEHj0fCFuu9oEVZr6kKxtFsgC3yncVGl5unkM0NquF6sm2Ta7027A6_umuAtVJh7SRDpgIWmY5XPmW-UBp0kxiK1Sq2vUqWla7gDTKdWKxZYFoL0gXNwteqRf5yTRrbM4IJQ35rAiogJCynyLJvoNAqURp8Gnb7I2EvSdKsw_6jqYsy3C3D1x_t7cjCYxqP5aDh-uiaHbldcXiaQN6SRrwq4RXjPk7tyU78A-EejIw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Fuzzing+MLIR+Compilers+with+Custom+Mutation+Synthesis&rft.au=Limpanukorn%2C+Ben&rft.au=Wang%2C+Jiyuan&rft.au=Kang%2C+Hong+Jin&rft.au=Zhou%2C+Zitong&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=217&rft.epage=229&rft_id=info:doi/10.1109%2FICSE55347.2025.00037&rft.externalDocID=11029941