Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery
In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is...
Saved in:
Published in | IEEE transactions on knowledge and data engineering Vol. 28; no. 8; pp. 1986 - 1999 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.08.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework. |
---|---|
AbstractList | In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework. |
Author | Qi Li Wei Fan Jing Gao Jiawei Han Lu Su Bo Zhao Yaliang Li |
Author_xml | – sequence: 1 givenname: Yaliang orcidid: 0000-0002-4204-6096 surname: Li fullname: Li, Yaliang – sequence: 2 givenname: Qi surname: Li fullname: Li, Qi – sequence: 3 givenname: Jing surname: Gao fullname: Gao, Jing – sequence: 4 givenname: Lu surname: Su fullname: Su, Lu – sequence: 5 givenname: Bo surname: Zhao fullname: Zhao, Bo – sequence: 6 givenname: Wei surname: Fan fullname: Fan, Wei – sequence: 7 givenname: Jiawei surname: Han fullname: Han, Jiawei |
BookMark | eNp9kE1PGzEQhi0EEp8_AHGx1Esvm_rb3t5QAqQCCQmF88rrzFLTjU1thyr_vg5BrcSB08zheWcevcdoP8QACJ1TMqGUtN8Wt7OrCSNUTZiUrTB0Dx1RKU3DaEv3604EbQQX-hAd5_xMCDHa0CPUT2MYRu9KxiXiuU2rGDbf8SW-TnYFf2L6hYeY8APkOL768IT_8z7gORRI8QkCxHXGM1ss7jd4kdblJ5757OIrpM0pOhjsmOHsfZ6gx-urxXTe3N3f_Jhe3jWOM1WanjjFlwyAOCkV0ZZJJRznom-dAs6EENoZoUyv-yWjzDplLGg7LA2XepD8BH3d3X1J8fcaculWVQHG0b7pddTUahTlglT0ywf0Oa5TqHaVIkxWIb09SHeUSzHnBEP3kvzKpk1HSbdtvdu23m1b795brxn9IeN8scXHUJL146fJi13SA8C_T1oo0lahvwDBkLU |
CODEN | ITKEEH |
CitedBy_id | crossref_primary_10_1109_TDSC_2024_3355453 crossref_primary_10_1016_j_comnet_2020_107582 crossref_primary_10_1016_j_sysarc_2020_101972 crossref_primary_10_1109_TDSC_2019_2919517 crossref_primary_10_1109_TMC_2024_3428542 crossref_primary_10_1109_MPRV_2023_3296271 crossref_primary_10_1145_3277505 crossref_primary_10_3390_s20030805 crossref_primary_10_1109_TMC_2020_2973980 crossref_primary_10_1109_TKDE_2020_2991000 crossref_primary_10_1109_TDSC_2024_3363507 crossref_primary_10_1109_TIFS_2018_2819134 crossref_primary_10_1016_j_knosys_2018_12_004 crossref_primary_10_3390_bdcc6040114 crossref_primary_10_1109_JIOT_2024_3359757 crossref_primary_10_1109_TKDE_2018_2860992 crossref_primary_10_1109_TNET_2021_3110052 crossref_primary_10_1109_ACCESS_2019_2934469 crossref_primary_10_1109_TMC_2024_3486689 crossref_primary_10_1016_j_comnet_2018_11_018 crossref_primary_10_1109_JIOT_2021_3110511 crossref_primary_10_1109_OJCOMS_2024_3438264 crossref_primary_10_1016_j_ins_2018_10_008 crossref_primary_10_1109_JSAC_2022_3213331 crossref_primary_10_1109_TMC_2024_3489717 crossref_primary_10_1109_JIOT_2020_3029294 crossref_primary_10_1016_j_cose_2016_11_014 crossref_primary_10_1109_TDSC_2019_2958901 crossref_primary_10_1109_JIOT_2019_2951687 crossref_primary_10_1109_TKDE_2021_3054409 crossref_primary_10_1109_TPDS_2017_2712630 crossref_primary_10_1145_3630102 crossref_primary_10_1109_TSC_2019_2961992 crossref_primary_10_1145_3614099 crossref_primary_10_1016_j_knosys_2018_07_003 crossref_primary_10_1016_j_neucom_2020_08_064 crossref_primary_10_3390_app13074217 crossref_primary_10_1109_ACCESS_2019_2897794 crossref_primary_10_1109_TNET_2023_3331059 crossref_primary_10_1016_j_cose_2020_101937 crossref_primary_10_1109_TDSC_2017_2753245 crossref_primary_10_1109_JSAC_2022_3213341 crossref_primary_10_1109_TKDE_2019_2914903 crossref_primary_10_1109_JIOT_2019_2921234 crossref_primary_10_1016_j_eij_2024_100518 crossref_primary_10_1109_TCE_2023_3322869 crossref_primary_10_1109_TMC_2022_3173642 crossref_primary_10_1109_TSC_2024_3470320 crossref_primary_10_1109_TBDATA_2024_3423677 crossref_primary_10_1007_s10462_024_10811_5 crossref_primary_10_1016_j_jnca_2023_103811 crossref_primary_10_1109_TDSC_2023_3276976 crossref_primary_10_1109_TIFS_2022_3207905 crossref_primary_10_1109_TMC_2021_3133365 crossref_primary_10_1016_j_ins_2019_01_068 crossref_primary_10_1016_j_jnca_2022_103484 crossref_primary_10_1109_TVT_2021_3077112 crossref_primary_10_1109_TCSS_2019_2956481 crossref_primary_10_1007_s10458_022_09569_3 crossref_primary_10_1016_j_eswa_2017_05_004 |
Cites_doi | 10.1145/2020408.2020567 10.1145/1963192.1963220 10.1145/1456650.1456651 10.1145/2588555.2593674 10.1145/2488388.2488479 10.1109/RTSS.2014.40 10.14778/2735496.2735505 10.1109/ICDE.2013.6544914 10.14778/2168651.2168656 10.1145/2588555.2610504 10.1007/s10462-012-9338-y 10.14778/1687553.1687620 10.1201/b12207 10.1145/1327452.1327492 10.1145/2783258.2783314 10.14778/2535568.2448943 10.1145/2488388.2488476 10.1145/2488388.2488422 10.1145/2737095.2737114 10.14778/1687627.1687690 10.1145/2588555.2610509 10.1023/A:1017501703105 10.1109/IPSN.2012.6920960 10.1109/TKDE.2011.75 10.14778/2535568.2448938 10.1145/1281192.1281309 10.1145/1718487.1718504 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
DOI | 10.1109/TKDE.2016.2559481 |
DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
DatabaseTitleList | Technology Research Database Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-2191 |
EndPage | 1999 |
ExternalDocumentID | 4111649271 10_1109_TKDE_2016_2559481 7460925 |
Genre | orig-research |
GrantInformation_xml | – fundername: US National Science Foundation – fundername: US National Science Foundation grantid: IIS-1319973; CNS-1566374 – fundername: US Army Research Office grantid: W911NF-13-1-0193 – fundername: US Army Research Laboratory grantid: W911NF-09-2-0053 (NS-CTA) |
GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AGSQL AHBIQ AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D F28 FR3 |
ID | FETCH-LOGICAL-c326t-b0c63d2ee0c55607a2564c334b9c6e324447c8468b7bd212ac68ae7afd8357f53 |
IEDL.DBID | RIE |
ISSN | 1041-4347 |
IngestDate | Fri Jul 11 09:09:14 EDT 2025 Mon Jun 30 03:18:54 EDT 2025 Thu Apr 24 22:58:13 EDT 2025 Tue Jul 01 03:14:37 EDT 2025 Wed Aug 27 02:52:16 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 8 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c326t-b0c63d2ee0c55607a2564c334b9c6e324447c8468b7bd212ac68ae7afd8357f53 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0002-4204-6096 |
PQID | 1802532675 |
PQPubID | 85438 |
PageCount | 14 |
ParticipantIDs | ieee_primary_7460925 crossref_primary_10_1109_TKDE_2016_2559481 proquest_miscellaneous_1825561340 proquest_journals_1802532675 crossref_citationtrail_10_1109_TKDE_2016_2559481 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2016-Aug.-1 2016-8-1 20160801 |
PublicationDateYYYYMMDD | 2016-08-01 |
PublicationDate_xml | – month: 08 year: 2016 text: 2016-Aug.-1 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on knowledge and data engineering |
PublicationTitleAbbrev | TKDE |
PublicationYear | 2016 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref35 ref13 ref34 ref12 ref15 ref31 marian (ref16) 2011; 34 ref30 ref33 ref11 ref10 ref2 ref17 ref19 ref18 banerjee (ref29) 2005; 6 zhou (ref24) 2012 yu (ref21) 0 bleiholder (ref1) 2009; 41 nocedal (ref32) 2006 zhao (ref14) 0 ref23 ref25 ref20 bertsekas (ref26) 1999 ref22 ref27 ref8 pasternack (ref9) 0 ref7 ref4 ref3 chu (ref36) 0 ref6 ref5 cormen (ref28) 2009 |
References_xml | – ident: ref8 doi: 10.1145/2020408.2020567 – ident: ref33 doi: 10.1145/1963192.1963220 – volume: 41 start-page: 1:1 year: 2009 ident: ref1 article-title: Data fusion publication-title: ACM Comput Surveys doi: 10.1145/1456650.1456651 – ident: ref19 doi: 10.1145/2588555.2593674 – ident: ref15 doi: 10.1145/2488388.2488479 – start-page: 2324 year: 0 ident: ref9 article-title: Making better informed trust decisions with generalized fact-finding publication-title: Proc 22nd Int Joint Conf Artif Intell – ident: ref13 doi: 10.1109/RTSS.2014.40 – ident: ref23 doi: 10.14778/2735496.2735505 – ident: ref7 doi: 10.1109/ICDE.2013.6544914 – year: 2006 ident: ref32 publication-title: Numerical Optimization – ident: ref6 doi: 10.14778/2168651.2168656 – volume: 34 start-page: 11 year: 2011 ident: ref16 article-title: Corroborating information from web sources publication-title: IEEE Data Eng Bull – ident: ref20 doi: 10.1145/2588555.2610504 – ident: ref25 doi: 10.1007/s10462-012-9338-y – ident: ref2 doi: 10.14778/1687553.1687620 – year: 2012 ident: ref24 publication-title: Ensemble Methods Foundations and Algorithms doi: 10.1201/b12207 – ident: ref35 doi: 10.1145/1327452.1327492 – ident: ref34 doi: 10.1145/2783258.2783314 – ident: ref11 doi: 10.14778/2535568.2448943 – ident: ref17 doi: 10.1145/2488388.2488476 – year: 1999 ident: ref26 publication-title: Non-Linear Programming – ident: ref18 doi: 10.1145/2488388.2488422 – ident: ref22 doi: 10.1145/2737095.2737114 – ident: ref10 doi: 10.14778/1687627.1687690 – volume: 6 start-page: 1705 year: 2005 ident: ref29 article-title: Clustering with Bregman divergences publication-title: J Mach Learn Res – start-page: 1567 year: 0 ident: ref21 article-title: The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding publication-title: Proc 25th Int Conf Comput Linguistics – ident: ref30 doi: 10.1145/2588555.2610509 – start-page: 281 year: 0 ident: ref36 article-title: Map-reduce for machine learning on multicore publication-title: Proc Adv Neural Inf Process Syst – ident: ref31 doi: 10.1023/A:1017501703105 – ident: ref12 doi: 10.1109/IPSN.2012.6920960 – year: 0 ident: ref14 article-title: A probabilistic model for estimating real-valued truth from conflicting sources publication-title: 10th International Workshop on Quality in Databases – year: 2009 ident: ref28 publication-title: Introduction to Algorithms – ident: ref3 doi: 10.1109/TKDE.2011.75 – ident: ref27 doi: 10.14778/2535568.2448938 – ident: ref4 doi: 10.1145/1281192.1281309 – ident: ref5 doi: 10.1145/1718487.1718504 |
SSID | ssj0008781 |
Score | 2.5084233 |
Snippet | In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1986 |
SubjectTerms | Climatology Computational efficiency Computational modeling Conflict resolution Data fusion Data models Descriptions Deviation Estimation heterogeneous data Mathematical analysis Mathematical models Meteorology Optimization Product development Raw materials Reliability truth discovery Weather |
Title | Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery |
URI | https://ieeexplore.ieee.org/document/7460925 https://www.proquest.com/docview/1802532675 https://www.proquest.com/docview/1825561340 |
Volume | 28 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT9swED8BT9sDDNhEB0NG2tO0lLSx42RviFJVTPBUJN4i--IIRNWgNnno_nruEicb2zTtLZIvieU7-37n-wL4XFgsihg5zs8VgUSZB1ZFKihypXRukJQs5w7f3MazO3l9r-634GufC-Oca4LP3JAfG19-XmLNV2XnWsZhOlbbsE2GW5ur1Z-6iW4akpJ1QTZRJLX3YI7C9Hz-fXLFQVzxkPGzTEavdFDTVOWPk7hRL9M9uOkm1kaVPA3ryg7xx281G_935u9g1-NMcdEKxj5sueUB7HU9HITf0gfw9peChIdgL32WyFpUpZiZFQnp5pu4ENMuhksQyBV857_gmwjxk_5xKWYcWlOSRLqyXouJqYywGzFf1dWDmDyukaNFN-_hbno1v5wFvgtDgATtqsCGGEf52LkQFXFOGwJJEqNI2hRjR3hMSo2EYhKrbU6K0GCcGKdNkRO404WKPsDOsly6IxAhSifHaI3WRiZGJQ5TQlQJO3-K3OEAwo4vGfoS5dwpY5E1pkqYZszKjFmZeVYO4Ev_ynNbn-NfxIfMmp7Qc2UAJx3zM7-D1xlXxlO0AJqGz_ph2nvsUDHNQhINF3AbRTL8-PcvH8Mb_n8bMHgCO9Wqdp8IxFT2tJHeF7yI7kU |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcoAeKLQgFgoYiRMi2-zGjhNuVberQLs9baXeInviiIpqU-0mh-XXM5M44SnELZInkZVv7PnseQG8Ky2WZYwc5-fKQKIsAqsiFZSFUrowSEaWc4cXl3F2JT9fq-sd-DDkwjjn2uAzN-bH1pdfVNjwVdmxlnGYTtU9uE92X026bK1h301025KUzhd0Koqk9j7MSZgeL89nZxzGFY-ZQctk8osVatuq_LEXtwZmvg-LfmpdXMnXcVPbMX77rWrj_879MTzyTFOcdKrxBHbc6gD2-y4Owi_qA9j7qSThIdhTnyeyEXUlMrMmNd1-FCdi3kdxCaK5gm_9b_kuQvyQv1mJjINrKtJJVzUbMTO1EXYrluum_iJmNxvkeNHtU7iany1Ps8D3YQiQyF0d2BDjqJg6F6Ii7LQhmiQxiqRNMXbEyKTUSDwmsdoWZAoNxolx2pQF0TtdqugZ7K6qlXsOIkTp5BSt0drIxKjEYUqcKmH3T1k4HEHY45KjL1LOvTJu8_awEqY5Q5kzlLmHcgTvh1fuugod_xI-ZGgGQY_KCI568HO_hjc518ZT9AM0Db8dhmn1sUvFtD-SZLiE2ySS4Yu_f_kNPMiWi4v84tPl-Ut4yHPpwgePYLdeN-4VUZravm41-TtQG_GO |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Conflicts+to+Harmony%3A+A+Framework+for+Resolving+Conflicts+in+Heterogeneous+Data+by+Truth+Discovery&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Li%2C+Yaliang&rft.au=Li%2C+Qi&rft.au=Gao%2C+Jing&rft.au=Lu%2C+Su&rft.date=2016-08-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=28&rft.issue=8&rft.spage=1986&rft_id=info:doi/10.1109%2FTKDE.2016.2559481&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=4111649271 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |