Workflow for evaluating quality of artificial intelligence (AI) services using held-out data

One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from the in-domain data and the out-of-domain data for evaluation by...

Full description

Saved in:
Bibliographic Details
Main Authors Li, Yunyao, Krishnamurthy, Rajasekar, Wang, Hao, Sen, Prithviraj, Vaithyanathan, Shivakumar, Han, Sang Don
Format Patent
LanguageEnglish
Published 30.08.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:One embodiment provides for a method for evaluation of an artificial intelligence (AI) service, the method includes partitioning, by a processor, data into in-domain data and out-of-domain data. The processor defines held-out data from the in-domain data and the out-of-domain data for evaluation by domain and sub-domain based on building a taxonomy of domains and sub-domains for the AI service. The processor further determines distribution underlying performance metrics for the held-out data using statistical processing. The processor also determines performance guarantees for multiple settings conditioned on multiple characteristics of an application scenario for the held-out data of the taxonomy based on the underlying performance metrics. The processor further provides confidence intervals based on the performance guarantees.
Bibliography:Application Number: US201816123822