Preprocessing is What You Need: Understanding and Predicting the Complexity of SAT-based Uniform Random Sampling

Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE) pp. 23 - 32
Main Authors Zeyen, Olivier, Cordy, Maxime, Perrouin, Gilles, Acher, Mathieu
Format Conference Proceeding
LanguageEnglish
Published ACM 14.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible configurations of the Linux kernel. These search spaces are generally too big to explore exhaustively, leading most testing approaches to sample a few solutions before analysing them. A desirable property of such samples is uniformity: each solution should get the same selection probability. This property motivated the design of uniform random samplers, relying on SAT solvers and counters and achieving different tradeoffs between uniformity and scalability. Though we can observe their performance in practice, understanding the complexity these tools face and accurately predicting it is an under-explored problem. Indeed, structural metrics such as the number of variables and clauses involved in a formula poorly predict the sampling complexity. More elaborated ones, such as minimal independent support (MIS), are intractable to compute on large formulae. We provide an efficient parallel algorithm to compute a related metric, the number of equivalence classes, and demonstrate that this metric is highly correlated to time and memory usage of uniform random sampling and model counting tools. We explore the role of formula preprocessing on various metrics and show its positive influence on correlations. Relying on these correlations, we train an efficient classifier (F1-score 0.97) to predict whether uniformly sampling a given formula will exceed a specified budget. Our results allow us to characterise the similarities and differences between (uniform) sampling, solving and counting.
AbstractList Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible configurations of the Linux kernel. These search spaces are generally too big to explore exhaustively, leading most testing approaches to sample a few solutions before analysing them. A desirable property of such samples is uniformity: each solution should get the same selection probability. This property motivated the design of uniform random samplers, relying on SAT solvers and counters and achieving different tradeoffs between uniformity and scalability. Though we can observe their performance in practice, understanding the complexity these tools face and accurately predicting it is an under-explored problem. Indeed, structural metrics such as the number of variables and clauses involved in a formula poorly predict the sampling complexity. More elaborated ones, such as minimal independent support (MIS), are intractable to compute on large formulae. We provide an efficient parallel algorithm to compute a related metric, the number of equivalence classes, and demonstrate that this metric is highly correlated to time and memory usage of uniform random sampling and model counting tools. We explore the role of formula preprocessing on various metrics and show its positive influence on correlations. Relying on these correlations, we train an efficient classifier (F1-score 0.97) to predict whether uniformly sampling a given formula will exceed a specified budget. Our results allow us to characterise the similarities and differences between (uniform) sampling, solving and counting.
Author Perrouin, Gilles
Zeyen, Olivier
Cordy, Maxime
Acher, Mathieu
Author_xml – sequence: 1
  givenname: Olivier
  surname: Zeyen
  fullname: Zeyen, Olivier
  organization: University of Luxembourg, SnT,Luxembourg
– sequence: 2
  givenname: Maxime
  surname: Cordy
  fullname: Cordy, Maxime
  organization: University of Luxembourg, SnT,Luxembourg
– sequence: 3
  givenname: Gilles
  surname: Perrouin
  fullname: Perrouin, Gilles
  organization: PReCISE/NaDI, University of Namur,Belgium
– sequence: 4
  givenname: Mathieu
  surname: Acher
  fullname: Acher, Mathieu
  organization: Univ Rennes, Inria, CNRS, IRISA,France
BookMark eNqFjsGKwjAUReOgMFX7By7eDxSe1ljjbpCRWckwrYgrieZVIzYpSQb07ycF97O6XM49cIesb6yhHktFIZZzxAL5UszeWDLjBc84CvHOUu9viJhPBSLnCWu_HbXOnsl7bS6gPeyvMsDB_sKWSK1gZxQ5H6RRHY8B0VD6HLoargRr27R3eujwBFtD-VFlJ-lJRVHX1jXwEx3bQCnjLDpjNqjl3VP6yhGbbD6r9VemiejYOt1I9zxO4zm-KPL8H_wHgRlJNA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
ESBDL
RIE
RIL
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE/IET Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400705892
EISSN 2575-5099
EndPage 32
ExternalDocumentID 10555673
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
ESBDL
IEGSK
OCL
RIE
RIL
ID FETCH-ieee_primary_105556733
IEDL.DBID RIE
IngestDate Wed Jul 03 05:40:23 EDT 2024
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_105556733
OpenAccessLink https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10555673
ParticipantIDs ieee_primary_10555673
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle 2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE)
PublicationTitleAbbrev FORMALISE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003190055
Score 3.8426292
Snippet Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and...
SourceID ieee
SourceType Publisher
StartPage 23
SubjectTerms Complexity theory
Computational efficiency
Correlation
Measurement
Memory management
Model Counting
Predictive models
Preprocessing
SAT
Scalability
Uniform random sampling
Title Preprocessing is What You Need: Understanding and Predicting the Complexity of SAT-based Uniform Random Sampling
URI https://ieeexplore.ieee.org/document/10555673
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sT57qI-Kjyhy8JprnJt5ELEUwFNtCb6WbTESkSdHkoL_emaSpDxQ8JYTMhCVZvuzO930DcG5TRMr3tRmKA6HHawxTM5CakdKLxKGQnECEwvdxMJx6dzN_thar11oYIqrJZ2TJaV3LT4ukkq2yC2nm6AfK7UBHRVEj1tpsqPC3JIZS37qk1CAx6EHcpm-4Ic9WVWoref_hvPjv5--A8anHw9EGaXZhi_I96LUNGXA9P_dhNRKPypr5z_fh0yuKMzfyjMaYY69w-lXKgnzgtFKpEe4z8q8gSk6xyCzfsMhwfD0xBeZSDhQF1xIfOKZY4nghRPT80YD-4HZyMzRlFPNV41wxbwfgHkA3L3I6BPQoS3jN4bqZVl4WhDrwIztRmUeXqR-l6giMX1Mc_3H9BLYdxn0puNheH7rlS0WnjNulPqvf1wdFQqDl
link.rule.ids 310,311,783,787,792,793,799,55088
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT4NAEJ5oPeipPmp8VJ2DV1AKC8WbMTaoLWksTXprujCYxhQapQf99c7Qh49o4glCmIENbGZ25_u-ATi3yCdPKW00RYHQ4TWGoTmQGr6nR3GDmtRwhSjcCd2g79wP1GBBVi-5MERUgs_IlNOylp_k8Uy2yi6kmaNyPXsdNpQkFnO61mpLhf8mkZT61ielDBOtKoTLB8zRIc_mrNBm_P5De_Hfb7ANtU9GHnZXsWYH1ijbheqyJQMuZugeTLuiUlli__k-HL-iaHMjz2kM2fYK-1_JLMgHdiu1GkE_IyeDKD5FJLN4wzzF3nVkSKBL2FA4XBN8ZJt8gr2RQNGzpxrUW7fRTWDIKIbTuXbFcDkAex8qWZ7RAaBDacyrDttOteekblO7yrdiL3XoMlF-4h1C7VcXR39cP4PNIOq0h-278OEYthqcBUj5xXLqUCleZnTCUbzQp-W3-wDYtaQy
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+12th+International+Conference+on+Formal+Methods+in+Software+Engineering+%28FormaliSE%29&rft.atitle=Preprocessing+is+What+You+Need%3A+Understanding+and+Predicting+the+Complexity+of+SAT-based+Uniform+Random+Sampling&rft.au=Zeyen%2C+Olivier&rft.au=Cordy%2C+Maxime&rft.au=Perrouin%2C+Gilles&rft.au=Acher%2C+Mathieu&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=2575-5099&rft.spage=23&rft.epage=32&rft.externalDocID=10555673