Approximating the Schema of a Set of Documents by Means of Resemblance

The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding...

Full description

Saved in:
Bibliographic Details
Published inJournal on data semantics Vol. 7; no. 2; pp. 87 - 105
Main Authors Abelló, Alberto, de Palol, Xavier, Hacid, Mohand-Saïd
Format Journal Article Publication
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2018
Springer Nature B.V
Springer
Subjects
Online AccessGet full text
ISSN1861-2032
1861-2040
DOI10.1007/s13740-018-0088-0

Cover

Loading…
Abstract The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation.
AbstractList The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation.
The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation. Peer Reviewed
Author Abelló, Alberto
de Palol, Xavier
Hacid, Mohand-Saïd
Author_xml – sequence: 1
  givenname: Alberto
  orcidid: 0000-0002-3223-2186
  surname: Abelló
  fullname: Abelló, Alberto
  email: aabello@essi.upc.edu
  organization: Dept. de Llenguatges i Sistemes Informàtics, U. Politècnica de Catalunya
– sequence: 2
  givenname: Xavier
  surname: de Palol
  fullname: de Palol, Xavier
  organization: Age Fotostock
– sequence: 3
  givenname: Mohand-Saïd
  surname: Hacid
  fullname: Hacid, Mohand-Saïd
  organization: LIRIS- UFR d’Informatique, U. Claude Bernard Lyon 1
BackLink https://hal.science/hal-01971563$$DView record in HAL
BookMark eNp9UU1rGzEQFcGFfNQ_ILeFnHLYZkbyStqjcfMFDoWmOQtZnrU32FpHkkvz76Pt5uvSCqQZDe8Nb-Yds5HvPDF2ivANAdRFRKEmUALqEkDn54AdoZZYcpjA6D0X_JCNY3yEfCQKqeGIXU13u9D9abc2tX5VpDUV925NW1t0TWGLe0p98r1z-y35FIvFc3FH1se--pMibRcb6x19ZV8au4k0fo0n7OHq8tfsppz_uL6dTeelm2iRymWDNTVcL5WWVFsQDlVNuKyxwaoClLzivNG6FhNcKmsXqrK11JqQUCmS4oSdD33XdmN2IcsOz6azrbmZzk1fA6wVVlL8xozFAevi3plAjoKz6S_6_dNfDoobIUDKOnPOBk5eytOeYjKP3T74PFKGVVxLjdij1Gvn0MUYqDGuTXmBnU_BthuDYHpfzOBL1qRN74uBT5remG9D_I_DB07MWL-i8KHp36QXg2iddw
CitedBy_id crossref_primary_10_2174_0126662558273437231204061106
crossref_primary_10_1007_s10844_018_0536_1
Cites_doi 10.1007/3-540-44533-1_24
10.1023/A:1021560618289
10.1137/0218082
10.1016/j.is.2018.02.007
10.1016/j.knosys.2006.08.006
10.14778/2777598.2777601
10.1016/S0020-0190(02)00345-9
10.1007/s11086-005-0032-6
10.1007/978-3-540-30081-6_8
10.1145/1841909.1841911
10.1016/S0304-3975(00)00294-2
10.1007/978-3-540-45227-0_12
10.1109/ICDEW.2006.166
10.1007/3-540-62222-5_33
10.1007/BF01202268
10.1016/B978-1-4832-1452-8.50145-7
10.1109/TKDE.2004.1264824
10.4018/978-1-59904-228-2.ch003
10.1016/j.is.2004.11.009
10.1007/978-3-642-39200-9_8
10.1016/S0306-4379(03)00031-0
10.1145/276304.276331
ContentType Journal Article
Publication
Contributor Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
Contributor_xml – sequence: 1
  fullname: Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
– sequence: 2
  fullname: Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering
Copyright Springer-Verlag GmbH Germany, part of Springer Nature 2018
Copyright Springer Science & Business Media 2018
info:eu-repo/semantics/openAccess
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Springer-Verlag GmbH Germany, part of Springer Nature 2018
– notice: Copyright Springer Science & Business Media 2018
– notice: info:eu-repo/semantics/openAccess
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
XX2
1XC
DOI 10.1007/s13740-018-0088-0
DatabaseName CrossRef
Recercat
Hyper Article en Ligne (HAL)
DatabaseTitle CrossRef
DatabaseTitleList


DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1861-2040
EndPage 105
ExternalDocumentID oai_HAL_hal_01971563v1
oai_recercat_cat_2072_330669
10_1007_s13740_018_0088_0
GroupedDBID -EM
0R~
0VY
203
30V
4.4
408
409
96X
AAAVM
AAHNG
AAIAL
AAJKR
AARHV
AARTL
AATVU
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
AAZMS
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABJNI
ABJOX
ABKCH
ABMQK
ABQBU
ABTEG
ABTHY
ABTMW
ABXPI
ACBXY
ACGFS
ACKNC
ACMLO
ACOKC
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADURQ
ADYFF
ADZKW
AEBTG
AEGNC
AEJHL
AEJRE
AEOHA
AEPYU
AETCA
AEXYK
AFBBN
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGQMX
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AI.
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
AKLTO
ALFXC
ALMA_UNASSIGNED_HOLDINGS
AMKLP
AMYQR
ANMIH
ASPBG
AUKKA
AVWKF
AXYYD
AYJHY
AZFZN
BGNMA
CSCUP
DNIVK
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FINBP
FNLPD
FRRFC
FSGXE
FYJPI
GGRSB
GJIRD
GQ6
HF~
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I0C
IXD
J-C
JBSCW
JCJTX
KOV
M4Y
NQJWS
NU0
O9-
O93
O9G
O9J
RLLFE
RSV
SCO
SHX
SISQX
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
TSG
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W48
Z83
Z88
ZMTXR
AAYXX
ABFSG
ACSTC
AEZWR
AFHIU
AHWEU
AIXLP
CITATION
EBLON
XX2
1XC
ID FETCH-LOGICAL-c483t-df19ef28d786e9a03c179e1d91f1550162522f889341d7aab75a9688e1e177e63
IEDL.DBID AGYKE
ISSN 1861-2032
IngestDate Wed Sep 03 07:08:48 EDT 2025
Fri Aug 29 12:38:03 EDT 2025
Sun Jun 29 14:41:55 EDT 2025
Thu Apr 24 23:09:27 EDT 2025
Tue Jul 01 03:01:14 EDT 2025
Fri Feb 21 02:34:49 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords Design
Document
XML
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c483t-df19ef28d786e9a03c179e1d91f1550162522f889341d7aab75a9688e1e177e63
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-3223-2186
OpenAccessLink https://recercat.cat/handle/2072/330669
PQID 2052868119
PQPubID 2044317
PageCount 19
ParticipantIDs hal_primary_oai_HAL_hal_01971563v1
csuc_recercat_oai_recercat_cat_2072_330669
proquest_journals_2052868119
crossref_citationtrail_10_1007_s13740_018_0088_0
crossref_primary_10_1007_s13740_018_0088_0
springer_journals_10_1007_s13740_018_0088_0
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2018-06-01
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-06-01
  day: 01
PublicationDecade 2010
PublicationPlace Berlin/Heidelberg
PublicationPlace_xml – name: Berlin/Heidelberg
– name: Heidelberg
PublicationSubtitle Concepts and Ideas for Building Knowledgeable Systems
PublicationTitle Journal on data semantics
PublicationTitleAbbrev J Data Semant
PublicationYear 2018
Publisher Springer Berlin Heidelberg
Springer Nature B.V
Springer
Publisher_xml – name: Springer Berlin Heidelberg
– name: Springer Nature B.V
– name: Springer
References ZhangZShashaDSimple fast algorithms for the editing distance between trees and related problemsSIAM J Comput198918612451262102547210.1137/02180820692.68047
Moh D-H, Lim E-P, Ng W-K (2000) Re-engineering structures from Web documents. In: 5th ACM conference on digital libraries (DL 2000). ACM, pp 67–76
Sanz I, Pérez J, Berlanga R, Aramburu M (2003) XML schemata inference and evolution. In: Proceedings of 14th international conference on databases and expert systems applications (DEXA’03), LNCS, vol 2736. Springer, pp 109–118
Jung J-S, Oh D-I, Kong Y-H, Ahn J-K (2002) Extracting information from XML documents by reverse generating a DTD. In: Proceedings of the EurAsia-ICT 2002, LNCS, vol 2510. Springer, pp 314–321
BertinoEGuerriniGMesitiMA matching algorithm for measuring the structural similarity between an XML document and a DTD and its applicationsInf Syst2004291234610.1016/S0306-4379(03)00031-0
W3C, Extensible Markup Language (XML) 1.0, 3rd Edition (February 2004)
BexGJGeladeWNevenFVansummerenSLearning deterministic regular expressions for the inference of schemas from XML dataACM Trans Web20104414:114:3210.1145/1841909.1841911
Wang K, Liu H (1997) Schema discovery for semistructured data. In: 3rd International conference on knowledge discovery and data mining (KDD-97), pp 271–274
WidomJData management for XML: research directionsIEEE Data Eng Bull19992234452
Boobna U, de Rougemont M (2004) Correctors for XML data. In: Proceedings of 2nd international XML database symposium (XSYM’04), LNCS, vol 3186. Springer, pp 97–111
AlbertJGiammarresiDWoodDNormal form algorithms for extended context-free grammarsTheor Comput Sci20012671–23547185565310.1016/S0304-3975(00)00294-20984.68092
MinJ-KAhnJ-YCungC-WEfficient extraction of schemas for XML documentsInform Process Lett200385712195015610.1016/S0020-0190(02)00345-91042.68040
NayakRIryadiWXML schema clustering with semantic and hierarchical similarity measuresKnowl Based Syst200720433634910.1016/j.knosys.2006.08.006
BaaderFCalvaneseDMcGuinnessDNardiDPatel-SchneiderPThe description logic handbook2003CambridgeCambridge University Press1274.68451
Teege G (1994) Making the difference: a substraction operation for description logics. In: Proceedings of the international conference on principles of knowledge representation and reasoning (KR’94). Morgan Kaufmann, pp 540–550
AbiteboulSBunemanPSuciuDData on the Web–from relations to semistructured data and XML2000BurlingtonMorgan Kaufmann
DalamagasTChengTWinkelK-JSellisTA methodology for clustering XML documents by structureInform Syst20063118722810.1016/j.is.2004.11.0091128.68345
WangLHassanzadehOZhangSShiJJiaoLZouJWangCSchema management for document storesProc VLDB Endow20158992293310.14778/2777598.2777601
GarofalakisMGionisARastogiRSechadriSShimKXTRACT: learning document type descriptors from XML document collectionsData Min Knowl Discov2003712356197370510.1023/A:1021560618289
BatageljVBrenMComparing resemblance measuresJ Classif19951217390134945310.1007/BF012022680833.62054
Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1998). ACM, pp 295–306
Hegewald J, Naumann F, Weis M (2006) XStruct: efficient schema extraction from multiple and large XML documents. In: Proceedings of the 22nd international conference on data engineering workshops, ICDE 2006, 3–7 Apr 2006, Atlanta, p 81
Izquierdo JLC, Cabot J (July 8-12, 2013) Discovering implicit schemas in JSON data. In: Web engineering—13th international conference, ICWE 2013, Aalborg, Proceedings, 2013, pp 68–83
Moh D-H, Lim E-P, Ng W-K (2000) DTD-miner: a tool for mining DTD from XML documents. In: Second international workshop on advance issues of E-commerce and web-based information systems (WECWIS 2000). IEEE Computer Society, pp 144–151
Estivill-Castro V, Yang J (2000) Fast and robust general purpose clustering algorithms. In: Proceedings of 6th Pacific Rim international conference on artificial intelligence (PRICAI 2000), LNCS, vol 1886. Springer, pp 208–218
GuerriniGMesitiMSanzIAkaliAPallisGAn overview of similarity measures for clustering XML documentsEmerging techniques and technologies: web data management practices2007HersheyIGI Global567810.4018/978-1-59904-228-2.ch003
LeonovAVKhusnutdinovRRStudy and development of the DTD generation system for XML documentsProgram Comput Softw200531419721010.1007/s11086-005-0032-61103.68479
LianWCheungDMamoulisNYiuS-MAn efficient and scalable algorithm for clustering XML documents by structureIEEE Trans Knowl Data Eng2004161829610.1109/TKDE.2004.1264824
Abiteboul S (1997) Querying semi-structured data. In: Proceedings of 6th international conference on database theory (ICDT’97), LNCS, vol 1186. Springer, pp 1–18
GallinucciEGolfarelliMRizziSSchema profiling of document-oriented databasesInform Syst201875132510.1016/j.is.2018.02.007
Klettke M, Störl U, Scherzinger S (2015) Schema extraction and structural outlier detection for JSON-based NOSQL data stores. In: Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Proceedings, pp 425–444
E Bertino (88_CR2) 2004; 29
88_CR22
W Lian (88_CR27) 2004; 16
88_CR21
Z Zhang (88_CR24) 1989; 18
E Gallinucci (88_CR28) 2018; 75
GJ Bex (88_CR16) 2010; 4
J Widom (88_CR11) 1999; 22
R Nayak (88_CR10) 2007; 20
88_CR25
M Garofalakis (88_CR7) 2003; 7
J Albert (88_CR6) 2001; 267
T Dalamagas (88_CR23) 2006; 31
J-K Min (88_CR20) 2003; 85
S Abiteboul (88_CR3) 2000
V Batagelj (88_CR26) 1995; 12
88_CR31
88_CR30
88_CR17
88_CR1
88_CR18
L Wang (88_CR4) 2015; 8
88_CR15
88_CR13
88_CR5
88_CR14
88_CR8
G Guerrini (88_CR12) 2007
AV Leonov (88_CR19) 2005; 31
88_CR9
(88_CR29) 2003
References_xml – reference: WidomJData management for XML: research directionsIEEE Data Eng Bull19992234452
– reference: NayakRIryadiWXML schema clustering with semantic and hierarchical similarity measuresKnowl Based Syst200720433634910.1016/j.knosys.2006.08.006
– reference: GarofalakisMGionisARastogiRSechadriSShimKXTRACT: learning document type descriptors from XML document collectionsData Min Knowl Discov2003712356197370510.1023/A:1021560618289
– reference: BertinoEGuerriniGMesitiMA matching algorithm for measuring the structural similarity between an XML document and a DTD and its applicationsInf Syst2004291234610.1016/S0306-4379(03)00031-0
– reference: Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1998). ACM, pp 295–306
– reference: Izquierdo JLC, Cabot J (July 8-12, 2013) Discovering implicit schemas in JSON data. In: Web engineering—13th international conference, ICWE 2013, Aalborg, Proceedings, 2013, pp 68–83
– reference: Sanz I, Pérez J, Berlanga R, Aramburu M (2003) XML schemata inference and evolution. In: Proceedings of 14th international conference on databases and expert systems applications (DEXA’03), LNCS, vol 2736. Springer, pp 109–118
– reference: ZhangZShashaDSimple fast algorithms for the editing distance between trees and related problemsSIAM J Comput198918612451262102547210.1137/02180820692.68047
– reference: Moh D-H, Lim E-P, Ng W-K (2000) DTD-miner: a tool for mining DTD from XML documents. In: Second international workshop on advance issues of E-commerce and web-based information systems (WECWIS 2000). IEEE Computer Society, pp 144–151
– reference: BexGJGeladeWNevenFVansummerenSLearning deterministic regular expressions for the inference of schemas from XML dataACM Trans Web20104414:114:3210.1145/1841909.1841911
– reference: Boobna U, de Rougemont M (2004) Correctors for XML data. In: Proceedings of 2nd international XML database symposium (XSYM’04), LNCS, vol 3186. Springer, pp 97–111
– reference: Abiteboul S (1997) Querying semi-structured data. In: Proceedings of 6th international conference on database theory (ICDT’97), LNCS, vol 1186. Springer, pp 1–18
– reference: AbiteboulSBunemanPSuciuDData on the Web–from relations to semistructured data and XML2000BurlingtonMorgan Kaufmann
– reference: GuerriniGMesitiMSanzIAkaliAPallisGAn overview of similarity measures for clustering XML documentsEmerging techniques and technologies: web data management practices2007HersheyIGI Global567810.4018/978-1-59904-228-2.ch003
– reference: LianWCheungDMamoulisNYiuS-MAn efficient and scalable algorithm for clustering XML documents by structureIEEE Trans Knowl Data Eng2004161829610.1109/TKDE.2004.1264824
– reference: W3C, Extensible Markup Language (XML) 1.0, 3rd Edition (February 2004)
– reference: Teege G (1994) Making the difference: a substraction operation for description logics. In: Proceedings of the international conference on principles of knowledge representation and reasoning (KR’94). Morgan Kaufmann, pp 540–550
– reference: Hegewald J, Naumann F, Weis M (2006) XStruct: efficient schema extraction from multiple and large XML documents. In: Proceedings of the 22nd international conference on data engineering workshops, ICDE 2006, 3–7 Apr 2006, Atlanta, p 81
– reference: Jung J-S, Oh D-I, Kong Y-H, Ahn J-K (2002) Extracting information from XML documents by reverse generating a DTD. In: Proceedings of the EurAsia-ICT 2002, LNCS, vol 2510. Springer, pp 314–321
– reference: MinJ-KAhnJ-YCungC-WEfficient extraction of schemas for XML documentsInform Process Lett200385712195015610.1016/S0020-0190(02)00345-91042.68040
– reference: AlbertJGiammarresiDWoodDNormal form algorithms for extended context-free grammarsTheor Comput Sci20012671–23547185565310.1016/S0304-3975(00)00294-20984.68092
– reference: Wang K, Liu H (1997) Schema discovery for semistructured data. In: 3rd International conference on knowledge discovery and data mining (KDD-97), pp 271–274
– reference: Klettke M, Störl U, Scherzinger S (2015) Schema extraction and structural outlier detection for JSON-based NOSQL data stores. In: Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Proceedings, pp 425–444
– reference: BatageljVBrenMComparing resemblance measuresJ Classif19951217390134945310.1007/BF012022680833.62054
– reference: LeonovAVKhusnutdinovRRStudy and development of the DTD generation system for XML documentsProgram Comput Softw200531419721010.1007/s11086-005-0032-61103.68479
– reference: GallinucciEGolfarelliMRizziSSchema profiling of document-oriented databasesInform Syst201875132510.1016/j.is.2018.02.007
– reference: Estivill-Castro V, Yang J (2000) Fast and robust general purpose clustering algorithms. In: Proceedings of 6th Pacific Rim international conference on artificial intelligence (PRICAI 2000), LNCS, vol 1886. Springer, pp 208–218
– reference: DalamagasTChengTWinkelK-JSellisTA methodology for clustering XML documents by structureInform Syst20063118722810.1016/j.is.2004.11.0091128.68345
– reference: WangLHassanzadehOZhangSShiJJiaoLZouJWangCSchema management for document storesProc VLDB Endow20158992293310.14778/2777598.2777601
– reference: BaaderFCalvaneseDMcGuinnessDNardiDPatel-SchneiderPThe description logic handbook2003CambridgeCambridge University Press1274.68451
– reference: Moh D-H, Lim E-P, Ng W-K (2000) Re-engineering structures from Web documents. In: 5th ACM conference on digital libraries (DL 2000). ACM, pp 67–76
– ident: 88_CR25
– volume-title: Data on the Web–from relations to semistructured data and XML
  year: 2000
  ident: 88_CR3
– ident: 88_CR31
  doi: 10.1007/3-540-44533-1_24
– volume: 7
  start-page: 23
  issue: 1
  year: 2003
  ident: 88_CR7
  publication-title: Data Min Knowl Discov
  doi: 10.1023/A:1021560618289
– volume-title: The description logic handbook
  year: 2003
  ident: 88_CR29
– volume: 18
  start-page: 1245
  issue: 6
  year: 1989
  ident: 88_CR24
  publication-title: SIAM J Comput
  doi: 10.1137/0218082
– volume: 75
  start-page: 13
  year: 2018
  ident: 88_CR28
  publication-title: Inform Syst
  doi: 10.1016/j.is.2018.02.007
– volume: 20
  start-page: 336
  issue: 4
  year: 2007
  ident: 88_CR10
  publication-title: Knowl Based Syst
  doi: 10.1016/j.knosys.2006.08.006
– volume: 8
  start-page: 922
  issue: 9
  year: 2015
  ident: 88_CR4
  publication-title: Proc VLDB Endow
  doi: 10.14778/2777598.2777601
– volume: 85
  start-page: 7
  year: 2003
  ident: 88_CR20
  publication-title: Inform Process Lett
  doi: 10.1016/S0020-0190(02)00345-9
– volume: 31
  start-page: 197
  issue: 4
  year: 2005
  ident: 88_CR19
  publication-title: Program Comput Softw
  doi: 10.1007/s11086-005-0032-6
– ident: 88_CR22
  doi: 10.1007/978-3-540-30081-6_8
– ident: 88_CR18
– volume: 4
  start-page: 14:1
  issue: 4
  year: 2010
  ident: 88_CR16
  publication-title: ACM Trans Web
  doi: 10.1145/1841909.1841911
– volume: 22
  start-page: 44
  issue: 3
  year: 1999
  ident: 88_CR11
  publication-title: IEEE Data Eng Bull
– volume: 267
  start-page: 35
  issue: 1–2
  year: 2001
  ident: 88_CR6
  publication-title: Theor Comput Sci
  doi: 10.1016/S0304-3975(00)00294-2
– ident: 88_CR9
  doi: 10.1007/978-3-540-45227-0_12
– ident: 88_CR14
  doi: 10.1109/ICDEW.2006.166
– ident: 88_CR1
  doi: 10.1007/3-540-62222-5_33
– volume: 12
  start-page: 73
  issue: 1
  year: 1995
  ident: 88_CR26
  publication-title: J Classif
  doi: 10.1007/BF01202268
– ident: 88_CR30
  doi: 10.1016/B978-1-4832-1452-8.50145-7
– volume: 16
  start-page: 82
  issue: 1
  year: 2004
  ident: 88_CR27
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2004.1264824
– ident: 88_CR13
– start-page: 56
  volume-title: Emerging techniques and technologies: web data management practices
  year: 2007
  ident: 88_CR12
  doi: 10.4018/978-1-59904-228-2.ch003
– volume: 31
  start-page: 187
  year: 2006
  ident: 88_CR23
  publication-title: Inform Syst
  doi: 10.1016/j.is.2004.11.009
– ident: 88_CR5
– ident: 88_CR21
  doi: 10.1007/978-3-642-39200-9_8
– ident: 88_CR17
– ident: 88_CR15
– volume: 29
  start-page: 23
  issue: 1
  year: 2004
  ident: 88_CR2
  publication-title: Inf Syst
  doi: 10.1016/S0306-4379(03)00031-0
– ident: 88_CR8
  doi: 10.1145/276304.276331
SSID ssj0000613680
Score 2.0564473
Snippet The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A...
SourceID hal
csuc
proquest
crossref
springer
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 87
SubjectTerms Algorithms
Artificial Intelligence
Automatic data collection systems
Classificació automàtica
Computer Science
Data mining
Database Management
Design
Document
Information Storage and Retrieval
Information Systems Applications (incl.Internet)
Informàtica
IT in Business
Mineria de dades
Original Article
Sistemes d'informació
XML
Àrees temàtiques de la UPC
Title Approximating the Schema of a Set of Documents by Means of Resemblance
URI https://link.springer.com/article/10.1007/s13740-018-0088-0
https://www.proquest.com/docview/2052868119
https://recercat.cat/handle/2072/330669
https://hal.science/hal-01971563
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bT9swFD4a5QUegHHROi6yEE9DQbGT-vJYppVqG7wMJHiyEtsBtFImmiLg13NOmrSAtkk8NGoTx3Lt43zfiY-_A7BnVOx5keaRcSGJUh13SPI2i5Dbuzg3DiGZVnSPT2T_LP1-3jmv93GPmmj3ZkmyelLPNrslikIRuY4Qt_AwB_NIP-K0BfPdo4sfs1crBFGyypnGteQR5Qhv1jP_Vs8rRGq50dghzlxRWOQLzvlmmbRCn94ynDbtngSd_D4Yl_mBe3oj6fjOP7YCSzUbZd2J-XyED2G4CstNpgdWT_xVWHwhW7gGvS4JkT9cE9kdXjKkkFiSxF_ZbcEy9iuU9AXha1ztn2P5IzsOCIl0liL9bvIB2do6nPW-nX7tR3U-hsilOikjX3ATCqG90jKYLE4czubAveEFOTocXSkhCo0MKOVeZVmuOpmRWgceuFJBJhvQGt4OwydghXdeIJV0IUP_0AmTepl4XeDIIeXTpg1xMybW1WLllDNjYGcyy9RrFnvNUq_ZuA1fprf8mSh1_L8wDrRFVAl3ListqWxPf9BHxErYBB0qiY3ZRXOYVkpF-92fls4hQVboASf3vA1bjbXY-jEwwko6QkvNOdax3wz-7PI_m_f5XaU3YUFU1kPvhragVd6NwzZSpTLfwanROzw82amnyDPD5QUj
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Pb9MwFH6C7gA7MCigdQywpp1AmWIn9Y9jhSiFtruslcbJSmxnILZuWtMJ-Ot5L03abgKkHRq1iWO59nO-78XP3wM4NCr2vEjzyLiQRKmOuyR5m0XI7V2cG4eQTCu642M5mKZfTrun9T7ueRPt3ixJVk_q9Wa3RFEoItcR4hYeHsJWii44mvVW79PX4frVCkGUrHKmcS15RDnCm_XMv9VzC5Fabr5wiDPfKCxyg3PeWSat0Ke_A5Om3cugkx9HizI_cr_vSDre8489hSc1G2W9pfk8gwdh1oadJtMDqyd-G7Y3ZAufQ79HQuQ_vxPZnZ0xpJBYksRf2WXBMnYSSvqC8LWo9s-x_BcbB4REOkuRfhf5OdnaC5j2P04-DKI6H0PkUp2UkS-4CYXQXmkZTBYnDmdz4N7wghwdjq6UEIVGBpRyr7IsV93MSK0DD1ypIJOX0JpdzsIusMI7L5BKupChf-iESb1MvC5w5JDyadOBuBkT62qxcsqZcW7XMsvUaxZ7zVKv2bgD71a3XC2VOv5fGAfaIqqEa5eVllS2Vz_oI2IlbIIOlcTGHKA5rCqlooPeyNI5JMgKPeDkhndgv7EWWz8G5lhJV2ipOcc63jeDv778z-bt3av0W3g0mIxHdvT5ePgKHovKkug90T60yutFeI20qczf1NPkD_2oBpY
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Zb9QwEB6VrYTgoRcgtpRiIZ5AaWMn6-NxBWyXXkKCSuXJJD4AUdKqm0Vtf31ndpPdUgES4iFRDsdK7Em-b-LxNwAvjEo9j3mZGBeyJNdpjyRviwS5vUtL4xCSaUT34FAOj_Ld495xk-d01Ea7t0OS0zkNpNJU1dtnPm7PJ75lisISuU4Qw3B1BxZzkrbrwGJ_59Pe_DcLwZWc5E_jWvKE8oW3Y5u_q-cXdOq40dgh5nylEMkb_PPWkOkEiQbL8Ll9hmkAyvetcV1uuatb8o7_8ZArsNSwVNafmtUqLIRqDZbbDBCs-SCswf0bcoYPYNAngfKLb0SCqy8MqSWWJFFYdhpZwT6EmjYQ1saTeXWsvGQHAaGSjlIE4I_yhGzwIRwN3n58PUyaPA2Jy3VWJz5yE6LQXmkZTJFmDt_ywL3hkRwgji6WEFEjM8q5V0VRql5hpNaBB65UkNkj6FSnVXgMLHrnBVJMFwr0G50wuZeZ1xF7EamgNl1I2_6xrhExp1waJ3Yuv0ytZrHVLLWaTbvwcnbJ2VTB4--FsdMtok04d0VtSX17tkOLSJWwGTpaEm_mOZrGrFIqOuzvWzqGxFmhZ5z95F3YaC3HNp-HEVbSE1pqzrGOV60hzE__8fbW_6n0M7j7_s3A7r873HsC98TEkOj30QZ06vNxeIpsqi43mzfmGrupD3o
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Approximating+the+Schema+of+a+Set+of+Documents+by+Means+of+Resemblance&rft.jtitle=Journal+on+data+semantics&rft.au=Abell%C3%B3%2C+Alberto&rft.au=de+Palol%2C+Xavier&rft.au=Hacid%2C+Mohand-Sa%C3%AFd&rft.date=2018-06-01&rft.issn=1861-2032&rft.eissn=1861-2040&rft.volume=7&rft.issue=2&rft.spage=87&rft.epage=105&rft_id=info:doi/10.1007%2Fs13740-018-0088-0&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s13740_018_0088_0
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1861-2032&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1861-2032&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1861-2032&client=summon