Preprocessing is What You Need: Understanding and Predicting the Complexity of SAT-based Uniform Random Sampling

Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE) pp. 23 - 32
Main Authors	Zeyen, Olivier, Cordy, Maxime, Perrouin, Gilles, Acher, Mathieu
Format	Conference Proceeding
Language	English
Published	ACM 14.04.2024
Subjects	Complexity theory Computational efficiency Correlation Measurement Memory management Model Counting Predictive models Preprocessing SAT Scalability Uniform random sampling
Online Access	Get full text

Cover

Loading…

Abstract	Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible configurations of the Linux kernel. These search spaces are generally too big to explore exhaustively, leading most testing approaches to sample a few solutions before analysing them. A desirable property of such samples is uniformity: each solution should get the same selection probability. This property motivated the design of uniform random samplers, relying on SAT solvers and counters and achieving different tradeoffs between uniformity and scalability. Though we can observe their performance in practice, understanding the complexity these tools face and accurately predicting it is an under-explored problem. Indeed, structural metrics such as the number of variables and clauses involved in a formula poorly predict the sampling complexity. More elaborated ones, such as minimal independent support (MIS), are intractable to compute on large formulae. We provide an efficient parallel algorithm to compute a related metric, the number of equivalence classes, and demonstrate that this metric is highly correlated to time and memory usage of uniform random sampling and model counting tools. We explore the role of formula preprocessing on various metrics and show its positive influence on correlations. Relying on these correlations, we train an efficient classifier (F1-score 0.97) to predict whether uniformly sampling a given formula will exceed a specified budget. Our results allow us to characterise the similarities and differences between (uniform) sampling, solving and counting.
AbstractList	Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and compute their number. Boolean formulae compactly encode huge, constrained search spaces for variability-intensive systems, e.g., the possible configurations of the Linux kernel. These search spaces are generally too big to explore exhaustively, leading most testing approaches to sample a few solutions before analysing them. A desirable property of such samples is uniformity: each solution should get the same selection probability. This property motivated the design of uniform random samplers, relying on SAT solvers and counters and achieving different tradeoffs between uniformity and scalability. Though we can observe their performance in practice, understanding the complexity these tools face and accurately predicting it is an under-explored problem. Indeed, structural metrics such as the number of variables and clauses involved in a formula poorly predict the sampling complexity. More elaborated ones, such as minimal independent support (MIS), are intractable to compute on large formulae. We provide an efficient parallel algorithm to compute a related metric, the number of equivalence classes, and demonstrate that this metric is highly correlated to time and memory usage of uniform random sampling and model counting tools. We explore the role of formula preprocessing on various metrics and show its positive influence on correlations. Relying on these correlations, we train an efficient classifier (F1-score 0.97) to predict whether uniformly sampling a given formula will exceed a specified budget. Our results allow us to characterise the similarities and differences between (uniform) sampling, solving and counting.
Author	Perrouin, Gilles Zeyen, Olivier Cordy, Maxime Acher, Mathieu
Author_xml	– sequence: 1 givenname: Olivier surname: Zeyen fullname: Zeyen, Olivier organization: University of Luxembourg, SnT,Luxembourg – sequence: 2 givenname: Maxime surname: Cordy fullname: Cordy, Maxime organization: University of Luxembourg, SnT,Luxembourg – sequence: 3 givenname: Gilles surname: Perrouin fullname: Perrouin, Gilles organization: PReCISE/NaDI, University of Namur,Belgium – sequence: 4 givenname: Mathieu surname: Acher fullname: Acher, Mathieu organization: Univ Rennes, Inria, CNRS, IRISA,France
BookMark	eNqFjsGKwjAUReOgMFX7By7eDxSe1ljjbpCRWckwrYgrieZVIzYpSQb07ycF97O6XM49cIesb6yhHktFIZZzxAL5UszeWDLjBc84CvHOUu9viJhPBSLnCWu_HbXOnsl7bS6gPeyvMsDB_sKWSK1gZxQ5H6RRHY8B0VD6HLoargRr27R3eujwBFtD-VFlJ-lJRVHX1jXwEx3bQCnjLDpjNqjl3VP6yhGbbD6r9VemiejYOt1I9zxO4zm-KPL8H_wHgRlJNA
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK ESBDL RIE RIL
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Open Access Journals IEEE/IET Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9798400705892
EISSN	2575-5099
EndPage	32
ExternalDocumentID	10555673
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO ESBDL IEGSK OCL RIE RIL
ID	FETCH-ieee_primary_105556733
IEDL.DBID	RIE
IngestDate	Wed Jul 03 05:40:23 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-ieee_primary_105556733
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10555673
ParticipantIDs	ieee_primary_10555673
PublicationCentury	2000
PublicationDate	2024-April-14
PublicationDateYYYYMMDD	2024-04-14
PublicationDate_xml	– month: 04 year: 2024 text: 2024-April-14 day: 14
PublicationDecade	2020
PublicationTitle	2024 IEEE/ACM 12th International Conference on Formal Methods in Software Engineering (FormaliSE)
PublicationTitleAbbrev	FORMALISE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssj0003190055
Score	3.8426292
Snippet	Despite its NP-completeness, the Boolean satisfiability problem gave birth to highly efficient tools that are able to find solutions to a Boolean formula and...
SourceID	ieee
SourceType	Publisher
StartPage	23
SubjectTerms	Complexity theory Computational efficiency Correlation Measurement Memory management Model Counting Predictive models Preprocessing SAT Scalability Uniform random sampling
Title	Preprocessing is What You Need: Understanding and Predicting the Complexity of SAT-based Uniform Random Sampling
URI	https://ieeexplore.ieee.org/document/10555673
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sT57qI-Kjyhy8JprnJt5ELEUwFNtCb6WbTESkSdHkoL_emaSpDxQ8JYTMhCVZvuzO930DcG5TRMr3tRmKA6HHawxTM5CakdKLxKGQnECEwvdxMJx6dzN_thar11oYIqrJZ2TJaV3LT4ukkq2yC2nm6AfK7UBHRVEj1tpsqPC3JIZS37qk1CAx6EHcpm-4Ic9WVWoref_hvPjv5--A8anHw9EGaXZhi_I96LUNGXA9P_dhNRKPypr5z_fh0yuKMzfyjMaYY69w-lXKgnzgtFKpEe4z8q8gSk6xyCzfsMhwfD0xBeZSDhQF1xIfOKZY4nghRPT80YD-4HZyMzRlFPNV41wxbwfgHkA3L3I6BPQoS3jN4bqZVl4WhDrwIztRmUeXqR-l6giMX1Mc_3H9BLYdxn0puNheH7rlS0WnjNulPqvf1wdFQqDl
link.rule.ids	310,311,783,787,792,793,799,55088
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT4NAEJ5oPeipPmp8VJ2DV1AKC8WbMTaoLWksTXprujCYxhQapQf99c7Qh49o4glCmIENbGZ25_u-ATi3yCdPKW00RYHQ4TWGoTmQGr6nR3GDmtRwhSjcCd2g79wP1GBBVi-5MERUgs_IlNOylp_k8Uy2yi6kmaNyPXsdNpQkFnO61mpLhf8mkZT61ielDBOtKoTLB8zRIc_mrNBm_P5De_Hfb7ANtU9GHnZXsWYH1ijbheqyJQMuZugeTLuiUlli__k-HL-iaHMjz2kM2fYK-1_JLMgHdiu1GkE_IyeDKD5FJLN4wzzF3nVkSKBL2FA4XBN8ZJt8gr2RQNGzpxrUW7fRTWDIKIbTuXbFcDkAex8qWZ7RAaBDacyrDttOteekblO7yrdiL3XoMlF-4h1C7VcXR39cP4PNIOq0h-278OEYthqcBUj5xXLqUCleZnTCUbzQp-W3-wDYtaQy
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+12th+International+Conference+on+Formal+Methods+in+Software+Engineering+%28FormaliSE%29&rft.atitle=Preprocessing+is+What+You+Need%3A+Understanding+and+Predicting+the+Complexity+of+SAT-based+Uniform+Random+Sampling&rft.au=Zeyen%2C+Olivier&rft.au=Cordy%2C+Maxime&rft.au=Perrouin%2C+Gilles&rft.au=Acher%2C+Mathieu&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=2575-5099&rft.spage=23&rft.epage=32&rft.externalDocID=10555673