Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine lea...

Full description

Saved in:

Bibliographic Details
Published in	Future internet Vol. 14; no. 7; p. 211
Main Authors	Uhm, Daiho, Jun, Sunghae
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.07.2022
Subjects	Algorithms Artificial intelligence Big Data Binomial distribution classification and regression trees count data Data analysis Data mining Electronic documents Internet of Things Inventors Keywords Machine learning Methods Nonparametric statistics patent analysis R&D Research & development Samples Silk Statistical analysis Statistical inference Structured data synthetic sample zero-inflated data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine learning algorithms. However, as the use of big data increases, problems also occur. One of them is a zero-inflated problem in structured data preprocessed from big data. Most count values are zeros because a specific word is found in only some documents. In particular, since most of the patent data are in the form of a text document, they are more affected by the zero-inflated problem. To solve this problem, we propose a generation of synthetic samples using statistical inference and tree structure. Using patent document and simulation data, we verify the performance and validity of our proposed method. In this paper, we focus on patent keyword analysis as text big data analysis, and we encounter the zero-inflated problem just like other text data.
ISSN:	1999-5903 1999-5903
DOI:	10.3390/fi14070211