LLMs for Test Input Generation for Semantic Applications

Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents and text generation. However, these models are computationally expensive. At scale, the cost of serving thousands of users increases massively...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/ACM 3rd International Conference on AI Engineering – Software Engineering for AI (CAIN) pp. 160 - 165
Main Authors Rasool, Zafaryab, Barnett, Scott, Willie, David, Kurniawan, Stefanus, Balugo, Sherwin, Thudumu, Srikanth, Abdelrazek, Mohamed
Format Conference Proceeding
LanguageEnglish
Published ACM 14.04.2024
Subjects
Online AccessGet full text
DOI10.1145/3644815.3644948

Cover

Abstract Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents and text generation. However, these models are computationally expensive. At scale, the cost of serving thousands of users increases massively affecting also user experience. To address this problem, semantic caches are used to check for answers to similar queries (that may have been phrased differently) without hitting the LLM service. Due to the nature of these semantic cache techniques that rely on query embeddings, there is a high chance of errors impacting user confidence in the system. Adopting semantic cache techniques usually requires testing the effectiveness of a semantic cache (accurate cache hits and misses) which requires a labelled test set of similar queries and responses which is often unavailable. In this paper, we present VaryGen, an approach for using LLMs for test input generation that produces similar questions from unstructured text documents. Our novel approach uses the reasoning capabilities of LLMs to 1) adapt queries to the domain, 2) synthesise subtle variations to queries, and 3) evaluate the synthesised test dataset. We evaluated our approach in the domain of a student question and answer system by qualitatively analysing 100 generated queries and result pairs, and conducting an empirical case study with an open source semantic cache. Our results show that query pairs satisfy human expectations of similarity and our generated data demonstrates failure cases of a semantic cache. Additionally, we also evaluate our approach on Qasper dataset. This work is an important first step into test input generation for semantic applications and presents considerations for practitioners when calibrating a semantic cache.CCS CONCEPTS* Software and its engineering → Empirical software validation.
AbstractList Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents and text generation. However, these models are computationally expensive. At scale, the cost of serving thousands of users increases massively affecting also user experience. To address this problem, semantic caches are used to check for answers to similar queries (that may have been phrased differently) without hitting the LLM service. Due to the nature of these semantic cache techniques that rely on query embeddings, there is a high chance of errors impacting user confidence in the system. Adopting semantic cache techniques usually requires testing the effectiveness of a semantic cache (accurate cache hits and misses) which requires a labelled test set of similar queries and responses which is often unavailable. In this paper, we present VaryGen, an approach for using LLMs for test input generation that produces similar questions from unstructured text documents. Our novel approach uses the reasoning capabilities of LLMs to 1) adapt queries to the domain, 2) synthesise subtle variations to queries, and 3) evaluate the synthesised test dataset. We evaluated our approach in the domain of a student question and answer system by qualitatively analysing 100 generated queries and result pairs, and conducting an empirical case study with an open source semantic cache. Our results show that query pairs satisfy human expectations of similarity and our generated data demonstrates failure cases of a semantic cache. Additionally, we also evaluate our approach on Qasper dataset. This work is an important first step into test input generation for semantic applications and presents considerations for practitioners when calibrating a semantic cache.CCS CONCEPTS* Software and its engineering → Empirical software validation.
Author Barnett, Scott
Kurniawan, Stefanus
Rasool, Zafaryab
Thudumu, Srikanth
Abdelrazek, Mohamed
Willie, David
Balugo, Sherwin
Author_xml – sequence: 1
  givenname: Zafaryab
  surname: Rasool
  fullname: Rasool, Zafaryab
  email: zafaryab.rasool@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
– sequence: 2
  givenname: Scott
  surname: Barnett
  fullname: Barnett, Scott
  email: scott.barnett@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
– sequence: 3
  givenname: David
  surname: Willie
  fullname: Willie, David
  email: david.willie@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
– sequence: 4
  givenname: Stefanus
  surname: Kurniawan
  fullname: Kurniawan, Stefanus
  email: stefanus.kurniawan@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
– sequence: 5
  givenname: Sherwin
  surname: Balugo
  fullname: Balugo, Sherwin
  email: s.balugo@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
– sequence: 6
  givenname: Srikanth
  surname: Thudumu
  fullname: Thudumu, Srikanth
  email: srikanth.thudumu@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
– sequence: 7
  givenname: Mohamed
  surname: Abdelrazek
  fullname: Abdelrazek, Mohamed
  email: mohamed.abdelrazek@deakin.edu.au
  organization: Deakin University,Applied Artificial Intelligence Institute,Geelong,Australia
BookMark eNotj7FOwzAUAI0EElAyszDkB1Ke857j57GqoFQKYmg7V7bjSJZaJ4rDwN8DhemGk066e3GdhhSEeJSwlJLUMzZELNXyl4b4ShRGGyYADcpIdSuKnKMDpZgY0dwJbtv3XPbDVO5DnsttGj_nchNSmOwch3Qxu3C2aY6-XI3jKfqLyA_iprenHIp_LsTh9WW_fqvaj812vWorWxPPVV87j9pQH8Cz1-CAOlsbw10jyVrturpDNr1CBOx8YwGM81KyRueYNC7E0183hhCO4xTPdvo6yp-FRirGbxs2RkQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3644815.3644948
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705915
EndPage 165
ExternalDocumentID 10556158
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a248t-f2bc3794fe0c8c70b04da2998d614aa7bd2d389f53303dc6a009bc11873bb8473
IEDL.DBID RIE
IngestDate Thu May 08 06:04:18 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-f2bc3794fe0c8c70b04da2998d614aa7bd2d389f53303dc6a009bc11873bb8473
PageCount 6
ParticipantIDs ieee_primary_10556158
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle 2024 IEEE/ACM 3rd International Conference on AI Engineering – Software Engineering for AI (CAIN)
PublicationTitleAbbrev CAIN
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib055848339
Score 1.8912717
Snippet Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents...
SourceID ieee
SourceType Publisher
StartPage 160
SubjectTerms Calibration
Costs
Large Language Model
Query Evaluation
Question Answering
Semantic Cache
Semantic search
Semantics
Software reliability
Software systems
Test Input Generation
User experience
Title LLMs for Test Input Generation for Semantic Applications
URI https://ieeexplore.ieee.org/document/10556158
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJ08qVvwmB6_t2ny06VHEMWUbghvsNvLyEhCxG9pe_OtN2k2HIHhqaA5pPn-vee_3foTcpMIh5MBjcKyIRZZCXCJibIXAggvpGAai8GSaj-bicSEXG7J6y4Wx1rbBZzYJxdaXjyvThKuyQSfmKFWP9Pw668ha28UjPZIqzstN-p5MyAEP_x6ZTMKzDAI_O_opLXwMD8h023AXNfKaNDUk5vNXTsZ_f9khiX6YevTpG4OOyJ6tjokajycf1FujdObPfPpQrZuadvmlwzS0Nc_2zY_pi6G3Ox7siMyH97O7UbxRSIg1E6qOHQPD_Y5yNjXKFCmkArUHGIUedbUuABl6i8SFEFKOJtfeogITFMY5gMclfkL61aqyp4RKZK4AbQXwoIgutWbKoLZlySB3Wp-RKHR7ue6SYCy3PT7_4_0F2Wce_4PjJROXpF-_N_bK43cN1-28fQGZy5sS
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA86D3pSceK3OXhN1-ajTY8ijk27IbjBbiMvHyBiN7S9-NebtJsOQfCUkBzSNB-_1773ez-EbmLuDKTACDiaEZ7EQHJjDLGcm4xx4agJROHROB1M-cNMzFZk9YYLY61tgs9sFKqNL98sdB1-lfVaMUcht9GOB34uWrrWevsIj6WSsXyVwCfhosfC10ciolDmQeJnQ0GlAZD-Phqvh27jRl6juoJIf_7KyvjvZztA3R-uHn76RqFDtGXLIySLYvSBvT2KJ_7Wx8NyWVe4zTAdFqLpebZv_q2-aHy74cPuomn_fnI3ICuNBKIolxVxFDTzZ8rZWEudxRBzozzESONxV6kMDDXeJnEhiJQZnSpvU4EOGuMMwCMTO0adclHaE4SFoS4DZTmwoIkulKJSG2XznELqlDpF3TDt-bJNgzFfz_jsj_ZrtDuYjIp5MRw_nqM96q2B4IZJ-AXqVO-1vfRoXsFVs4Zf7DyeXw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+3rd+International+Conference+on+AI+Engineering+%E2%80%93+Software+Engineering+for+AI+%28CAIN%29&rft.atitle=LLMs+for+Test+Input+Generation+for+Semantic+Applications&rft.au=Rasool%2C+Zafaryab&rft.au=Barnett%2C+Scott&rft.au=Willie%2C+David&rft.au=Kurniawan%2C+Stefanus&rft.date=2024-04-14&rft.pub=ACM&rft.spage=160&rft.epage=165&rft_id=info:doi/10.1145%2F3644815.3644948&rft.externalDocID=10556158