Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery

In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 28; no. 8; pp. 1986 - 1999
Main Authors Li, Yaliang, Li, Qi, Gao, Jing, Su, Lu, Zhao, Bo, Fan, Wei, Han, Jiawei
Format Journal Article
LanguageEnglish
Published New York IEEE 01.08.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework.
AbstractList In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework.
Author Qi Li
Wei Fan
Jing Gao
Jiawei Han
Lu Su
Bo Zhao
Yaliang Li
Author_xml – sequence: 1
  givenname: Yaliang
  orcidid: 0000-0002-4204-6096
  surname: Li
  fullname: Li, Yaliang
– sequence: 2
  givenname: Qi
  surname: Li
  fullname: Li, Qi
– sequence: 3
  givenname: Jing
  surname: Gao
  fullname: Gao, Jing
– sequence: 4
  givenname: Lu
  surname: Su
  fullname: Su, Lu
– sequence: 5
  givenname: Bo
  surname: Zhao
  fullname: Zhao, Bo
– sequence: 6
  givenname: Wei
  surname: Fan
  fullname: Fan, Wei
– sequence: 7
  givenname: Jiawei
  surname: Han
  fullname: Han, Jiawei
BookMark eNp9kE1PGzEQhi0EEp8_AHGx1Esvm_rb3t5QAqQCCQmF88rrzFLTjU1thyr_vg5BrcSB08zheWcevcdoP8QACJ1TMqGUtN8Wt7OrCSNUTZiUrTB0Dx1RKU3DaEv3604EbQQX-hAd5_xMCDHa0CPUT2MYRu9KxiXiuU2rGDbf8SW-TnYFf2L6hYeY8APkOL768IT_8z7gORRI8QkCxHXGM1ss7jd4kdblJ5757OIrpM0pOhjsmOHsfZ6gx-urxXTe3N3f_Jhe3jWOM1WanjjFlwyAOCkV0ZZJJRznom-dAs6EENoZoUyv-yWjzDplLGg7LA2XepD8BH3d3X1J8fcaculWVQHG0b7pddTUahTlglT0ywf0Oa5TqHaVIkxWIb09SHeUSzHnBEP3kvzKpk1HSbdtvdu23m1b795brxn9IeN8scXHUJL146fJi13SA8C_T1oo0lahvwDBkLU
CODEN ITKEEH
CitedBy_id crossref_primary_10_1109_TDSC_2024_3355453
crossref_primary_10_1016_j_comnet_2020_107582
crossref_primary_10_1016_j_sysarc_2020_101972
crossref_primary_10_1109_TDSC_2019_2919517
crossref_primary_10_1109_TMC_2024_3428542
crossref_primary_10_1109_MPRV_2023_3296271
crossref_primary_10_1145_3277505
crossref_primary_10_3390_s20030805
crossref_primary_10_1109_TMC_2020_2973980
crossref_primary_10_1109_TKDE_2020_2991000
crossref_primary_10_1109_TDSC_2024_3363507
crossref_primary_10_1109_TIFS_2018_2819134
crossref_primary_10_1016_j_knosys_2018_12_004
crossref_primary_10_3390_bdcc6040114
crossref_primary_10_1109_JIOT_2024_3359757
crossref_primary_10_1109_TKDE_2018_2860992
crossref_primary_10_1109_TNET_2021_3110052
crossref_primary_10_1109_ACCESS_2019_2934469
crossref_primary_10_1109_TMC_2024_3486689
crossref_primary_10_1016_j_comnet_2018_11_018
crossref_primary_10_1109_JIOT_2021_3110511
crossref_primary_10_1109_OJCOMS_2024_3438264
crossref_primary_10_1016_j_ins_2018_10_008
crossref_primary_10_1109_JSAC_2022_3213331
crossref_primary_10_1109_TMC_2024_3489717
crossref_primary_10_1109_JIOT_2020_3029294
crossref_primary_10_1016_j_cose_2016_11_014
crossref_primary_10_1109_TDSC_2019_2958901
crossref_primary_10_1109_JIOT_2019_2951687
crossref_primary_10_1109_TKDE_2021_3054409
crossref_primary_10_1109_TPDS_2017_2712630
crossref_primary_10_1145_3630102
crossref_primary_10_1109_TSC_2019_2961992
crossref_primary_10_1145_3614099
crossref_primary_10_1016_j_knosys_2018_07_003
crossref_primary_10_1016_j_neucom_2020_08_064
crossref_primary_10_3390_app13074217
crossref_primary_10_1109_ACCESS_2019_2897794
crossref_primary_10_1109_TNET_2023_3331059
crossref_primary_10_1016_j_cose_2020_101937
crossref_primary_10_1109_TDSC_2017_2753245
crossref_primary_10_1109_JSAC_2022_3213341
crossref_primary_10_1109_TKDE_2019_2914903
crossref_primary_10_1109_JIOT_2019_2921234
crossref_primary_10_1016_j_eij_2024_100518
crossref_primary_10_1109_TCE_2023_3322869
crossref_primary_10_1109_TMC_2022_3173642
crossref_primary_10_1109_TSC_2024_3470320
crossref_primary_10_1109_TBDATA_2024_3423677
crossref_primary_10_1007_s10462_024_10811_5
crossref_primary_10_1016_j_jnca_2023_103811
crossref_primary_10_1109_TDSC_2023_3276976
crossref_primary_10_1109_TIFS_2022_3207905
crossref_primary_10_1109_TMC_2021_3133365
crossref_primary_10_1016_j_ins_2019_01_068
crossref_primary_10_1016_j_jnca_2022_103484
crossref_primary_10_1109_TVT_2021_3077112
crossref_primary_10_1109_TCSS_2019_2956481
crossref_primary_10_1007_s10458_022_09569_3
crossref_primary_10_1016_j_eswa_2017_05_004
Cites_doi 10.1145/2020408.2020567
10.1145/1963192.1963220
10.1145/1456650.1456651
10.1145/2588555.2593674
10.1145/2488388.2488479
10.1109/RTSS.2014.40
10.14778/2735496.2735505
10.1109/ICDE.2013.6544914
10.14778/2168651.2168656
10.1145/2588555.2610504
10.1007/s10462-012-9338-y
10.14778/1687553.1687620
10.1201/b12207
10.1145/1327452.1327492
10.1145/2783258.2783314
10.14778/2535568.2448943
10.1145/2488388.2488476
10.1145/2488388.2488422
10.1145/2737095.2737114
10.14778/1687627.1687690
10.1145/2588555.2610509
10.1023/A:1017501703105
10.1109/IPSN.2012.6920960
10.1109/TKDE.2011.75
10.14778/2535568.2448938
10.1145/1281192.1281309
10.1145/1718487.1718504
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
DOI 10.1109/TKDE.2016.2559481
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList
Technology Research Database
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2191
EndPage 1999
ExternalDocumentID 4111649271
10_1109_TKDE_2016_2559481
7460925
Genre orig-research
GrantInformation_xml – fundername: US National Science Foundation
– fundername: US National Science Foundation
  grantid: IIS-1319973; CNS-1566374
– fundername: US Army Research Office
  grantid: W911NF-13-1-0193
– fundername: US Army Research Laboratory
  grantid: W911NF-09-2-0053 (NS-CTA)
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AGSQL
AHBIQ
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
F28
FR3
ID FETCH-LOGICAL-c326t-b0c63d2ee0c55607a2564c334b9c6e324447c8468b7bd212ac68ae7afd8357f53
IEDL.DBID RIE
ISSN 1041-4347
IngestDate Fri Jul 11 09:09:14 EDT 2025
Mon Jun 30 03:18:54 EDT 2025
Thu Apr 24 22:58:13 EDT 2025
Tue Jul 01 03:14:37 EDT 2025
Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c326t-b0c63d2ee0c55607a2564c334b9c6e324447c8468b7bd212ac68ae7afd8357f53
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-4204-6096
PQID 1802532675
PQPubID 85438
PageCount 14
ParticipantIDs ieee_primary_7460925
crossref_primary_10_1109_TKDE_2016_2559481
proquest_miscellaneous_1825561340
proquest_journals_1802532675
crossref_citationtrail_10_1109_TKDE_2016_2559481
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2016-Aug.-1
2016-8-1
20160801
PublicationDateYYYYMMDD 2016-08-01
PublicationDate_xml – month: 08
  year: 2016
  text: 2016-Aug.-1
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on knowledge and data engineering
PublicationTitleAbbrev TKDE
PublicationYear 2016
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref34
ref12
ref15
ref31
marian (ref16) 2011; 34
ref30
ref33
ref11
ref10
ref2
ref17
ref19
ref18
banerjee (ref29) 2005; 6
zhou (ref24) 2012
yu (ref21) 0
bleiholder (ref1) 2009; 41
nocedal (ref32) 2006
zhao (ref14) 0
ref23
ref25
ref20
bertsekas (ref26) 1999
ref22
ref27
ref8
pasternack (ref9) 0
ref7
ref4
ref3
chu (ref36) 0
ref6
ref5
cormen (ref28) 2009
References_xml – ident: ref8
  doi: 10.1145/2020408.2020567
– ident: ref33
  doi: 10.1145/1963192.1963220
– volume: 41
  start-page: 1:1
  year: 2009
  ident: ref1
  article-title: Data fusion
  publication-title: ACM Comput Surveys
  doi: 10.1145/1456650.1456651
– ident: ref19
  doi: 10.1145/2588555.2593674
– ident: ref15
  doi: 10.1145/2488388.2488479
– start-page: 2324
  year: 0
  ident: ref9
  article-title: Making better informed trust decisions with generalized fact-finding
  publication-title: Proc 22nd Int Joint Conf Artif Intell
– ident: ref13
  doi: 10.1109/RTSS.2014.40
– ident: ref23
  doi: 10.14778/2735496.2735505
– ident: ref7
  doi: 10.1109/ICDE.2013.6544914
– year: 2006
  ident: ref32
  publication-title: Numerical Optimization
– ident: ref6
  doi: 10.14778/2168651.2168656
– volume: 34
  start-page: 11
  year: 2011
  ident: ref16
  article-title: Corroborating information from web sources
  publication-title: IEEE Data Eng Bull
– ident: ref20
  doi: 10.1145/2588555.2610504
– ident: ref25
  doi: 10.1007/s10462-012-9338-y
– ident: ref2
  doi: 10.14778/1687553.1687620
– year: 2012
  ident: ref24
  publication-title: Ensemble Methods Foundations and Algorithms
  doi: 10.1201/b12207
– ident: ref35
  doi: 10.1145/1327452.1327492
– ident: ref34
  doi: 10.1145/2783258.2783314
– ident: ref11
  doi: 10.14778/2535568.2448943
– ident: ref17
  doi: 10.1145/2488388.2488476
– year: 1999
  ident: ref26
  publication-title: Non-Linear Programming
– ident: ref18
  doi: 10.1145/2488388.2488422
– ident: ref22
  doi: 10.1145/2737095.2737114
– ident: ref10
  doi: 10.14778/1687627.1687690
– volume: 6
  start-page: 1705
  year: 2005
  ident: ref29
  article-title: Clustering with Bregman divergences
  publication-title: J Mach Learn Res
– start-page: 1567
  year: 0
  ident: ref21
  article-title: The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding
  publication-title: Proc 25th Int Conf Comput Linguistics
– ident: ref30
  doi: 10.1145/2588555.2610509
– start-page: 281
  year: 0
  ident: ref36
  article-title: Map-reduce for machine learning on multicore
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref31
  doi: 10.1023/A:1017501703105
– ident: ref12
  doi: 10.1109/IPSN.2012.6920960
– year: 0
  ident: ref14
  article-title: A probabilistic model for estimating real-valued truth from conflicting sources
  publication-title: 10th International Workshop on Quality in Databases
– year: 2009
  ident: ref28
  publication-title: Introduction to Algorithms
– ident: ref3
  doi: 10.1109/TKDE.2011.75
– ident: ref27
  doi: 10.14778/2535568.2448938
– ident: ref4
  doi: 10.1145/1281192.1281309
– ident: ref5
  doi: 10.1145/1718487.1718504
SSID ssj0008781
Score 2.5084233
Snippet In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1986
SubjectTerms Climatology
Computational efficiency
Computational modeling
Conflict resolution
Data fusion
Data models
Descriptions
Deviation
Estimation
heterogeneous data
Mathematical analysis
Mathematical models
Meteorology
Optimization
Product development
Raw materials
Reliability
truth discovery
Weather
Title Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery
URI https://ieeexplore.ieee.org/document/7460925
https://www.proquest.com/docview/1802532675
https://www.proquest.com/docview/1825561340
Volume 28
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT9swED8BT9sDDNhEB0NG2tO0lLSx42RviFJVTPBUJN4i--IIRNWgNnno_nruEicb2zTtLZIvieU7-37n-wL4XFgsihg5zs8VgUSZB1ZFKihypXRukJQs5w7f3MazO3l9r-634GufC-Oca4LP3JAfG19-XmLNV2XnWsZhOlbbsE2GW5ur1Z-6iW4akpJ1QTZRJLX3YI7C9Hz-fXLFQVzxkPGzTEavdFDTVOWPk7hRL9M9uOkm1kaVPA3ryg7xx281G_935u9g1-NMcdEKxj5sueUB7HU9HITf0gfw9peChIdgL32WyFpUpZiZFQnp5pu4ENMuhksQyBV857_gmwjxk_5xKWYcWlOSRLqyXouJqYywGzFf1dWDmDyukaNFN-_hbno1v5wFvgtDgATtqsCGGEf52LkQFXFOGwJJEqNI2hRjR3hMSo2EYhKrbU6K0GCcGKdNkRO404WKPsDOsly6IxAhSifHaI3WRiZGJQ5TQlQJO3-K3OEAwo4vGfoS5dwpY5E1pkqYZszKjFmZeVYO4Ev_ynNbn-NfxIfMmp7Qc2UAJx3zM7-D1xlXxlO0AJqGz_ph2nvsUDHNQhINF3AbRTL8-PcvH8Mb_n8bMHgCO9Wqdp8IxFT2tJHeF7yI7kU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoAeKLQgFgoYiRMi2-zGjhNuVberQLs9baXeInviiIpqU-0mh-XXM5M44SnELZInkZVv7PnseQG8Ky2WZYwc5-fKQKIsAqsiFZSFUrowSEaWc4cXl3F2JT9fq-sd-DDkwjjn2uAzN-bH1pdfVNjwVdmxlnGYTtU9uE92X026bK1h301025KUzhd0Koqk9j7MSZgeL89nZxzGFY-ZQctk8osVatuq_LEXtwZmvg-LfmpdXMnXcVPbMX77rWrj_879MTzyTFOcdKrxBHbc6gD2-y4Owi_qA9j7qSThIdhTnyeyEXUlMrMmNd1-FCdi3kdxCaK5gm_9b_kuQvyQv1mJjINrKtJJVzUbMTO1EXYrluum_iJmNxvkeNHtU7iany1Ps8D3YQiQyF0d2BDjqJg6F6Ii7LQhmiQxiqRNMXbEyKTUSDwmsdoWZAoNxolx2pQF0TtdqugZ7K6qlXsOIkTp5BSt0drIxKjEYUqcKmH3T1k4HEHY45KjL1LOvTJu8_awEqY5Q5kzlLmHcgTvh1fuugod_xI-ZGgGQY_KCI568HO_hjc518ZT9AM0Db8dhmn1sUvFtD-SZLiE2ySS4Yu_f_kNPMiWi4v84tPl-Ut4yHPpwgePYLdeN-4VUZravm41-TtQG_GO
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Conflicts+to+Harmony%3A+A+Framework+for+Resolving+Conflicts+in+Heterogeneous+Data+by+Truth+Discovery&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Li%2C+Yaliang&rft.au=Li%2C+Qi&rft.au=Gao%2C+Jing&rft.au=Lu%2C+Su&rft.date=2016-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=28&rft.issue=8&rft.spage=1986&rft_id=info:doi/10.1109%2FTKDE.2016.2559481&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=4111649271
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon