Fuzzing MLIR Compilers with Custom Mutation Synthesis

Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom...

Full description

Saved in:
Bibliographic Details
Published inProceedings / International Conference on Software Engineering pp. 217 - 229
Main Authors Limpanukorn, Ben, Wang, Jiyuan, Kang, Hong Jin, Zhou, Zitong, Kim, Miryung
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.04.2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code.
AbstractList Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code.
Author Limpanukorn, Ben
Kim, Miryung
Zhou, Zitong
Kang, Hong Jin
Wang, Jiyuan
Author_xml – sequence: 1
  givenname: Ben
  surname: Limpanukorn
  fullname: Limpanukorn, Ben
  email: blimpan@cs.ucla.edu
  organization: University of California,Los Angeles
– sequence: 2
  givenname: Jiyuan
  surname: Wang
  fullname: Wang, Jiyuan
  email: wangjiyuan@cs.ucla.edu
  organization: University of California,Los Angeles
– sequence: 3
  givenname: Hong Jin
  surname: Kang
  fullname: Kang, Hong Jin
  email: hjkang@cs.ucla.edu
  organization: University of California,Los Angeles
– sequence: 4
  givenname: Zitong
  surname: Zhou
  fullname: Zhou, Zitong
  email: zitongzhou@cs.ucla.edu
  organization: University of California,Los Angeles
– sequence: 5
  givenname: Miryung
  surname: Kim
  fullname: Kim, Miryung
  email: miryung@cs.ucla.edu
  organization: University of California,Los Angeles
BookMark eNotj8tKw0AUQEdRsK39gy7mBxLvnVdylxJaDaQItvsySSZ2pElKJkHar7egq7M5HDhz9tD1nWNshRAjAr3k2W6ttVRJLEDoGABkcseWlFAqJWrQhvCezVDrNEIh9BObh_B904wimjG9ma5X333xbZF_8qxvz_7khsB__Hjk2RTGvuXbabSj7zu-u3Tj0QUfntljY0_BLf-5YPvNep-9R8XHW569FpEnOUalSK1AWScNgUKoyTpblnUDqa4aoIqs0WSEQ1s1llA0KJ0Bp5RTwoAxcsFWf1nvnDucB9_a4XK4bQsihfIXofJHvw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE55347.2025.00037
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331505691
EISSN 1558-1225
EndPage 229
ExternalDocumentID 11029941
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: 2106838,1764077,1956322,2106404
  funderid: 10.13039/100000001
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i93t-b28a213d7f90410d9aeabbdf085cf09c9a65962e1acfa912f13e60e44e4260663
IEDL.DBID RIE
IngestDate Wed Aug 27 01:40:12 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-b28a213d7f90410d9aeabbdf085cf09c9a65962e1acfa912f13e60e44e4260663
PageCount 13
ParticipantIDs ieee_primary_11029941
PublicationCentury 2000
PublicationDate 2025-April-26
PublicationDateYYYYMMDD 2025-04-26
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-26
  day: 26
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0006499
Score 2.2896361
Snippet Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level...
SourceID ieee
SourceType Publisher
StartPage 217
SubjectTerms code patterns
Codes
compiler testing
Fuzzing
Generators
Grammar-based fuzzing
Hardware acceleration
MLIR
Program processors
program synthesis
program transformation
Shape
Software engineering
Tensors
Testing
Writing
Title Fuzzing MLIR Compilers with Custom Mutation Synthesis
URI https://ieeexplore.ieee.org/document/11029941
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LawIxEA6tp57sw9I3OfQa3WSzcXMWRUuVUi14k-xmAlK6Ft096K_vZHe1tFDoLSSBhITMfDOTb4aQR-V0VwNETCEmYlJYyUwSW2YgVD4XiYyd90OOJ2r4Jp_m0bwmq5dcGAAoP59B2zfLWL5dpYV3lXVQVaH09DT1Y7TcKrLWQewqxO41N44HujPqTftRFMou2oDC-02C8GcFlVKBDJpksl-6-jfy3i7ypJ3ufmVl_PfeTknrm6tHXw5a6IwcQXZOmvtiDbR-uxckGhS7Hc6g4-fRK_XjKA_WG-odsbRXIAb8oOOiCszT6TZDYLhZblpkNujPekNW10xgSx3mLBGxETy0XacDyQOrDZgksQ6BVeoCnWqjfLkd4CZ1RnPheAgqACnBZ6pH9HFJGtkqgytCA2eFi2IeOUgRZbnEpLHQBi0aNPli665Jy5_C4rPKirHYH8DNH_235MTfhI_ECHVHGvm6gHtU6HnyUF7kF27Kn9o
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA4yD3ry18Tf5uC1W9MmWXMeG5uuQ9yE3UbavMAQO9nag_vrfWm7iYLgrbSBlIS873vv5XuPkAdpVUcBCE8iJ_J4YLink8h4GkLpapHwyLo4ZDyWg1f-OBOzWqxeamEAoLx8Bi33WObyzTItXKisjVCF1tPJ1PcR-AWr5Fo7wyuRvdfqOOar9rA76QkR8g56gYGLnPjhzx4qJYT0j8h4O3l1c-StVeRJK938qsv47787Js1vtR593uHQCdmD7JQcbds10Pr0nhHRLzYbHEHj0fCFuu9oEVZr6kKxtFsgC3yncVGl5unkM0NquF6sm2Ta7027A6_umuAtVJh7SRDpgIWmY5XPmW-UBp0kxiK1Sq2vUqWla7gDTKdWKxZYFoL0gXNwteqRf5yTRrbM4IJQ35rAiogJCynyLJvoNAqURp8Gnb7I2EvSdKsw_6jqYsy3C3D1x_t7cjCYxqP5aDh-uiaHbldcXiaQN6SRrwq4RXjPk7tyU78A-EejIw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Fuzzing+MLIR+Compilers+with+Custom+Mutation+Synthesis&rft.au=Limpanukorn%2C+Ben&rft.au=Wang%2C+Jiyuan&rft.au=Kang%2C+Hong+Jin&rft.au=Zhou%2C+Zitong&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=217&rft.epage=229&rft_id=info:doi/10.1109%2FICSE55347.2025.00037&rft.externalDocID=11029941