주성분 분석을 활용한 재현자료 생성

It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal compon...

Full description

Saved in:
Bibliographic Details
Published inŬngyong tʻonggye yŏnʼgu Vol. 36; no. 4; pp. 279 - 294
Main Author 박민정(Min-Jeong Park)
Format Journal Article
LanguageKorean
Published 한국통계학회 2023
Subjects
Online AccessGet full text
ISSN1225-066X
2383-5818
DOI10.5351/KJAS.2023.36.4.279

Cover

Abstract It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets. 재현자료를 생성할 때 순차회귀 다중대체(SRMI)를 이용하는 방식이 가장 널리 알려져 있으며, 이를 구현한 소프트웨어로 R-패키지 synthpop이 활용되고 있다. 본 논문에서는 확률적 주성분 분석(PPCA)을 이용하여 재현자료를 생성하는 방안을 제안하고 2개의 데이터 세트를 이용한 모의실험으로 SRMI 방식과 PPCA 방식을 비교하였다. 모의실험에서 PPCA 방식으로 생성한 재현자료는 쌍별 상관계수를 기준으로 원자료와의 유사성이 가장 우수함을 확인하였다. 향후 PPCA 방식을 이용하여 시계열 자료에 대한 재현자료 생성을 연구하고자 한다.
AbstractList 재현자료를 생성할 때 순차회귀 다중대체(SRMI)를 이용하는 방식이 가장 널리 알려져 있으며, 이를 구현한 소프트웨어로 R-패키지 synthpop이 활용되고 있다. 본 논문에서는 확률적 주성분 분석(PPCA)을 이용하여 재현자료를 생성하는 방안을 제안하고 2개의 데이터 세트를 이용한 모의실험으로 SRMI 방식과 PPCA 방식을 비교하였다. 모의실험에서 PPCA 방식으로 생성한 재현자료는 쌍별 상관계수를 기준으로 원자료와의 유사성이 가장 우수함을 확인하였다. 향후 PPCA 방식을 이용하여 시계열 자료에 대한 재현자료 생성을 연구하고자 한다. It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets. KCI Citation Count: 0
It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets. 재현자료를 생성할 때 순차회귀 다중대체(SRMI)를 이용하는 방식이 가장 널리 알려져 있으며, 이를 구현한 소프트웨어로 R-패키지 synthpop이 활용되고 있다. 본 논문에서는 확률적 주성분 분석(PPCA)을 이용하여 재현자료를 생성하는 방안을 제안하고 2개의 데이터 세트를 이용한 모의실험으로 SRMI 방식과 PPCA 방식을 비교하였다. 모의실험에서 PPCA 방식으로 생성한 재현자료는 쌍별 상관계수를 기준으로 원자료와의 유사성이 가장 우수함을 확인하였다. 향후 PPCA 방식을 이용하여 시계열 자료에 대한 재현자료 생성을 연구하고자 한다.
Author 박민정(Min-Jeong Park)
Author_xml – sequence: 1
  fullname: 박민정(Min-Jeong Park)
BackLink https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002989821$$DAccess content in National Research Foundation of Korea (NRF)
BookMark eNpFkDtLA0EUhQeJYIz5A1bb2Ai7zp3XzpRLjJoHBjSF3bDZh6wbN5LVwtotLWwElSRNKsHGxsp_lMl_cGMEDxzOLT7ugbONKtkoixDaBexwyuGg0_bOHYIJdahwmENctYGqhEpqcwmygqpACLexEBdbqJ7nV7iUAMKkqiIw829TfC6-Cqu0KaZmWljL14l5e18-Tywz-1i-FGb2tJg_WuZhWqI7aDP2h3lU_8sa6h81-40Tu9s7bjW8rp0q7trC5X7IIiYIiAik4oLHUg3cIA4kV4wGchC4HAci5oEQKoyA4kEccckBKI8DWkP767fZONZpkOiRn_zm5UinY-2d9VsaMFGKMVXCe2s4TfLbRGdhPtRtr9NbjUI4cwleGf657G6cXEdh4uub8vDH9_q0d9gEKNtBuvQHKPJvIQ
ContentType Journal Article
DBID DBRKI
TDB
JDI
ACYCR
DEWEY 519.5
DOI 10.5351/KJAS.2023.36.4.279
DatabaseName DBPIA - 디비피아
Nurimedia DBPIA Journals
KoreaScience
Korean Citation Index
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Applied Sciences
Mathematics
DocumentTitleAlternate Synthetic data generation by probabilistic PCA
DocumentTitle_FL Synthetic data generation by probabilistic PCA
EISSN 2383-5818
EndPage 294
ExternalDocumentID oai_kci_go_kr_ARTI_10299449
JAKO202325472047201
NODE11511187
GroupedDBID 9ZL
ALMA_UNASSIGNED_HOLDINGS
DBRKI
JDI
OK1
TDB
ACYCR
ID FETCH-LOGICAL-k957-675ad4e46216e189565f89b7cfc85943c8bc750c6f5c669de130bfe5851135fc3
ISSN 1225-066X
IngestDate Wed May 22 07:06:10 EDT 2024
Fri Dec 22 11:59:37 EST 2023
Thu Feb 06 13:37:23 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 4
Keywords 확률적 주성분 분석
재현자료
probabilistic principal component analysis
synthetic data
Language Korean
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-k957-675ad4e46216e189565f89b7cfc85943c8bc750c6f5c669de130bfe5851135fc3
Notes KISTI1.1003/JNL.JAKO202325472047201
OpenAccessLink http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO202325472047201&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01
PageCount 16
ParticipantIDs nrf_kci_oai_kci_go_kr_ARTI_10299449
kisti_ndsl_JAKO202325472047201
nurimedia_primary_NODE11511187
PublicationCentury 2000
PublicationDate 2023
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – year: 2023
  text: 2023
PublicationDecade 2020
PublicationTitle Ŭngyong tʻonggye yŏnʼgu
PublicationTitleAlternate The Korean journal of applied statistics
PublicationYear 2023
Publisher 한국통계학회
Publisher_xml – name: 한국통계학회
SSID ssj0000612489
ssib053377530
ssib001150021
ssib044750966
ssib022238561
Score 2.202572
Snippet It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for...
재현자료를 생성할 때 순차회귀 다중대체(SRMI)를 이용하는 방식이 가장 널리 알려져 있으며, 이를 구현한 소프트웨어로 R-패키지 synthpop이 활용되고 있다. 본 논문에서는 확률적 주성분 분석(PPCA)을 이용하여 재현자료를 생성하는 방안을 제안하고 2개의 데이터 세트를 이용한...
SourceID nrf
kisti
nurimedia
SourceType Open Website
Open Access Repository
Publisher
StartPage 279
SubjectTerms 통계학
Title 주성분 분석을 활용한 재현자료 생성
URI https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11511187
http://click.ndsl.kr/servlet/LinkingDetailView?cn=JAKO202325472047201&dbt=JAKO&org_code=O481&site_code=SS1481&service_code=01
https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002989821
Volume 36
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX 응용통계연구, 2023, 36(4), , pp.279-294
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV29b9NAFD9VZaAMfBRQy0dlCW6KHOL4bN-NtuOopEqLRJC6WbFzjqIgB6XJEAYWssHAggSo6dIJiYWFif-ozv_Ae7bjWqUSH4Ody7Pvw_dO736_s-89Qh4LFvW6loxUqUupMoFhXjQeqQA2NF3W8cURrkO2983dF6x1aByurb0rfbU0nQTV8PWl-0r-R6sgA73iLtl_0GxRKAggDfqFM2gYzn-lY-q51Nap42KCM-po1HOoY0K6UqTya6KBCTin1xpUCCrSfMKmtkglBkgqqcijtpuKeF4ASEQNi4TqeHYT1_My04rLIJe6BuSP-zOMYzShrk0dB5L9mazM8BpvxqnQ7U9X-k5bW8MmYB3N_JFslBRDYhCrLYlFPlvtL8pXK7KtxLlpBcuxiltY2N7M-Uk-xljZkGYhZvI5uZ4FQr5o7g3dQHO_17KfV7Gyqm5WWbXIWvatfWHOK75EbNl7B5gVeDKG64EDqPSVumVp-H1o-413ji0BOtfOsQ46SqyVuCLAZguoX61Y4EPsyNIQjMWTZ5u2sNVPfm8zsCOkDAMAOfEYsNHVeIoBHsBKlABP5ya5njMVxc6G3S2yNhxtkhs5a1HyOeFok1xrF55_4d8GspfM-fdtoiWnP5P597MfcwWOZL5IFnNl-fk4-fJ1-fFYSU6-LT_Nk5MPZ6fvleTtAm69QzpNr-PuqnmMDnUoDEsFutntMcnMumZKjQPZNiIuAiuMQm4Ipoc8CKGfQjMyQtMUPQmQKYgkvovWdCMK9btkPR7FcosowPKk1WPdSOsC5BT1AN-JC5MZGrckENttspP2jx_3jl76l-htmzyCjvOH4cBHn-n42x_5w7EPzPCpD0BaCCgXiik61n-VeXTx9w8aHugX5n1u3ftTPffJBkqydbgHZH0ynsqHgEwnwU46Yn4BihB0nA
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%EC%A3%BC%EC%84%B1%EB%B6%84+%EB%B6%84%EC%84%9D%EC%9D%84+%ED%99%9C%EC%9A%A9%ED%95%9C+%EC%9E%AC%ED%98%84%EC%9E%90%EB%A3%8C+%EC%83%9D%EC%84%B1&rft.jtitle=%C5%ACngyong+t%CA%BBonggye+y%C5%8Fn%CA%BCgu&rft.au=%EB%B0%95%EB%AF%BC%EC%A0%95&rft.au=Min-Jeong+Park&rft.date=2023&rft.issn=1225-066X&rft.volume=36&rft.issue=4&rft.spage=279&rft.epage=294&rft_id=info:doi/10.5351%2FKJAS.2023.36.4.279&rft.externalDBID=n%2Fa&rft.externalDocID=JAKO202325472047201
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1225-066X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1225-066X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1225-066X&client=summon