Approximating the Schema of a Set of Documents by Means of Resemblance
The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding...
Saved in:
Published in | Journal on data semantics Vol. 7; no. 2; pp. 87 - 105 |
---|---|
Main Authors | , , |
Format | Journal Article Publication |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.06.2018
Springer Nature B.V Springer |
Subjects | |
Online Access | Get full text |
ISSN | 1861-2032 1861-2040 |
DOI | 10.1007/s13740-018-0088-0 |
Cover
Loading…
Abstract | The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation. |
---|---|
AbstractList | The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation. The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A semi-structured model allows to share documents that do not have exactly the same structure. However, it does not facilitate the understanding of such heterogeneous documents. In this paper, we offer a characterization and algorithm to obtain a representative (in terms of a resemblance function) of a set of heterogeneous semi-structured documents. We approximate the representative so that the resemblance function is maximized. Then, the algorithm is generalized to deal with repetitions and different classes of documents. Although an exact representative could always be found using an unlimited number of optional elements, it would cause an overfitting problem. The size of an exact representative for a set of heterogeneous documents may even make it useless. Our experiments show that, for users, it is easier and faster to deal with smaller representatives, even compensating the loss in the approximation. Peer Reviewed |
Author | Abelló, Alberto de Palol, Xavier Hacid, Mohand-Saïd |
Author_xml | – sequence: 1 givenname: Alberto orcidid: 0000-0002-3223-2186 surname: Abelló fullname: Abelló, Alberto email: aabello@essi.upc.edu organization: Dept. de Llenguatges i Sistemes Informàtics, U. Politècnica de Catalunya – sequence: 2 givenname: Xavier surname: de Palol fullname: de Palol, Xavier organization: Age Fotostock – sequence: 3 givenname: Mohand-Saïd surname: Hacid fullname: Hacid, Mohand-Saïd organization: LIRIS- UFR d’Informatique, U. Claude Bernard Lyon 1 |
BackLink | https://hal.science/hal-01971563$$DView record in HAL |
BookMark | eNp9UU1rGzEQFcGFfNQ_ILeFnHLYZkbyStqjcfMFDoWmOQtZnrU32FpHkkvz76Pt5uvSCqQZDe8Nb-Yds5HvPDF2ivANAdRFRKEmUALqEkDn54AdoZZYcpjA6D0X_JCNY3yEfCQKqeGIXU13u9D9abc2tX5VpDUV925NW1t0TWGLe0p98r1z-y35FIvFc3FH1se--pMibRcb6x19ZV8au4k0fo0n7OHq8tfsppz_uL6dTeelm2iRymWDNTVcL5WWVFsQDlVNuKyxwaoClLzivNG6FhNcKmsXqrK11JqQUCmS4oSdD33XdmN2IcsOz6azrbmZzk1fA6wVVlL8xozFAevi3plAjoKz6S_6_dNfDoobIUDKOnPOBk5eytOeYjKP3T74PFKGVVxLjdij1Gvn0MUYqDGuTXmBnU_BthuDYHpfzOBL1qRN74uBT5remG9D_I_DB07MWL-i8KHp36QXg2iddw |
CitedBy_id | crossref_primary_10_2174_0126662558273437231204061106 crossref_primary_10_1007_s10844_018_0536_1 |
Cites_doi | 10.1007/3-540-44533-1_24 10.1023/A:1021560618289 10.1137/0218082 10.1016/j.is.2018.02.007 10.1016/j.knosys.2006.08.006 10.14778/2777598.2777601 10.1016/S0020-0190(02)00345-9 10.1007/s11086-005-0032-6 10.1007/978-3-540-30081-6_8 10.1145/1841909.1841911 10.1016/S0304-3975(00)00294-2 10.1007/978-3-540-45227-0_12 10.1109/ICDEW.2006.166 10.1007/3-540-62222-5_33 10.1007/BF01202268 10.1016/B978-1-4832-1452-8.50145-7 10.1109/TKDE.2004.1264824 10.4018/978-1-59904-228-2.ch003 10.1016/j.is.2004.11.009 10.1007/978-3-642-39200-9_8 10.1016/S0306-4379(03)00031-0 10.1145/276304.276331 |
ContentType | Journal Article Publication |
Contributor | Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering |
Contributor_xml | – sequence: 1 fullname: Universitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació – sequence: 2 fullname: Universitat Politècnica de Catalunya. inSSIDE - integrated Software, Service, Information and Data Engineering |
Copyright | Springer-Verlag GmbH Germany, part of Springer Nature 2018 Copyright Springer Science & Business Media 2018 info:eu-repo/semantics/openAccess Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: Springer-Verlag GmbH Germany, part of Springer Nature 2018 – notice: Copyright Springer Science & Business Media 2018 – notice: info:eu-repo/semantics/openAccess – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | AAYXX CITATION XX2 1XC |
DOI | 10.1007/s13740-018-0088-0 |
DatabaseName | CrossRef Recercat Hyper Article en Ligne (HAL) |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1861-2040 |
EndPage | 105 |
ExternalDocumentID | oai_HAL_hal_01971563v1 oai_recercat_cat_2072_330669 10_1007_s13740_018_0088_0 |
GroupedDBID | -EM 0R~ 0VY 203 30V 4.4 408 409 96X AAAVM AAHNG AAIAL AAJKR AARHV AARTL AATVU AAWCG AAYIU AAYQN AAYTO AAYZH AAZMS ABBXA ABDZT ABECU ABFTD ABFTV ABJNI ABJOX ABKCH ABMQK ABQBU ABTEG ABTHY ABTMW ABXPI ACBXY ACGFS ACKNC ACMLO ACOKC ADHHG ADHIR ADINQ ADKNI ADKPE ADRFC ADURQ ADYFF ADZKW AEBTG AEGNC AEJHL AEJRE AEOHA AEPYU AETCA AEXYK AFBBN AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGQMX AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AI. AIIXL AILAN AITGF AJBLW AJRNO AJZVZ AKLTO ALFXC ALMA_UNASSIGNED_HOLDINGS AMKLP AMYQR ANMIH ASPBG AUKKA AVWKF AXYYD AYJHY AZFZN BGNMA CSCUP DNIVK EBS EIOEI EJD ESBYG FEDTE FERAY FINBP FNLPD FRRFC FSGXE FYJPI GGRSB GJIRD GQ6 HF~ HMJXF HQYDN HRMNR HVGLF HZ~ I0C IXD J-C JBSCW JCJTX KOV M4Y NQJWS NU0 O9- O93 O9G O9J RLLFE RSV SCO SHX SISQX SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE TSG UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W48 Z83 Z88 ZMTXR AAYXX ABFSG ACSTC AEZWR AFHIU AHWEU AIXLP CITATION EBLON XX2 1XC |
ID | FETCH-LOGICAL-c483t-df19ef28d786e9a03c179e1d91f1550162522f889341d7aab75a9688e1e177e63 |
IEDL.DBID | AGYKE |
ISSN | 1861-2032 |
IngestDate | Wed Sep 03 07:08:48 EDT 2025 Fri Aug 29 12:38:03 EDT 2025 Sun Jun 29 14:41:55 EDT 2025 Thu Apr 24 23:09:27 EDT 2025 Tue Jul 01 03:01:14 EDT 2025 Fri Feb 21 02:34:49 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Keywords | Design Document XML |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c483t-df19ef28d786e9a03c179e1d91f1550162522f889341d7aab75a9688e1e177e63 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-3223-2186 |
OpenAccessLink | https://recercat.cat/handle/2072/330669 |
PQID | 2052868119 |
PQPubID | 2044317 |
PageCount | 19 |
ParticipantIDs | hal_primary_oai_HAL_hal_01971563v1 csuc_recercat_oai_recercat_cat_2072_330669 proquest_journals_2052868119 crossref_citationtrail_10_1007_s13740_018_0088_0 crossref_primary_10_1007_s13740_018_0088_0 springer_journals_10_1007_s13740_018_0088_0 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2018-06-01 |
PublicationDateYYYYMMDD | 2018-06-01 |
PublicationDate_xml | – month: 06 year: 2018 text: 2018-06-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Berlin/Heidelberg |
PublicationPlace_xml | – name: Berlin/Heidelberg – name: Heidelberg |
PublicationSubtitle | Concepts and Ideas for Building Knowledgeable Systems |
PublicationTitle | Journal on data semantics |
PublicationTitleAbbrev | J Data Semant |
PublicationYear | 2018 |
Publisher | Springer Berlin Heidelberg Springer Nature B.V Springer |
Publisher_xml | – name: Springer Berlin Heidelberg – name: Springer Nature B.V – name: Springer |
References | ZhangZShashaDSimple fast algorithms for the editing distance between trees and related problemsSIAM J Comput198918612451262102547210.1137/02180820692.68047 Moh D-H, Lim E-P, Ng W-K (2000) Re-engineering structures from Web documents. In: 5th ACM conference on digital libraries (DL 2000). ACM, pp 67–76 Sanz I, Pérez J, Berlanga R, Aramburu M (2003) XML schemata inference and evolution. In: Proceedings of 14th international conference on databases and expert systems applications (DEXA’03), LNCS, vol 2736. Springer, pp 109–118 Jung J-S, Oh D-I, Kong Y-H, Ahn J-K (2002) Extracting information from XML documents by reverse generating a DTD. In: Proceedings of the EurAsia-ICT 2002, LNCS, vol 2510. Springer, pp 314–321 BertinoEGuerriniGMesitiMA matching algorithm for measuring the structural similarity between an XML document and a DTD and its applicationsInf Syst2004291234610.1016/S0306-4379(03)00031-0 W3C, Extensible Markup Language (XML) 1.0, 3rd Edition (February 2004) BexGJGeladeWNevenFVansummerenSLearning deterministic regular expressions for the inference of schemas from XML dataACM Trans Web20104414:114:3210.1145/1841909.1841911 Wang K, Liu H (1997) Schema discovery for semistructured data. In: 3rd International conference on knowledge discovery and data mining (KDD-97), pp 271–274 WidomJData management for XML: research directionsIEEE Data Eng Bull19992234452 Boobna U, de Rougemont M (2004) Correctors for XML data. In: Proceedings of 2nd international XML database symposium (XSYM’04), LNCS, vol 3186. Springer, pp 97–111 AlbertJGiammarresiDWoodDNormal form algorithms for extended context-free grammarsTheor Comput Sci20012671–23547185565310.1016/S0304-3975(00)00294-20984.68092 MinJ-KAhnJ-YCungC-WEfficient extraction of schemas for XML documentsInform Process Lett200385712195015610.1016/S0020-0190(02)00345-91042.68040 NayakRIryadiWXML schema clustering with semantic and hierarchical similarity measuresKnowl Based Syst200720433634910.1016/j.knosys.2006.08.006 BaaderFCalvaneseDMcGuinnessDNardiDPatel-SchneiderPThe description logic handbook2003CambridgeCambridge University Press1274.68451 Teege G (1994) Making the difference: a substraction operation for description logics. In: Proceedings of the international conference on principles of knowledge representation and reasoning (KR’94). Morgan Kaufmann, pp 540–550 AbiteboulSBunemanPSuciuDData on the Web–from relations to semistructured data and XML2000BurlingtonMorgan Kaufmann DalamagasTChengTWinkelK-JSellisTA methodology for clustering XML documents by structureInform Syst20063118722810.1016/j.is.2004.11.0091128.68345 WangLHassanzadehOZhangSShiJJiaoLZouJWangCSchema management for document storesProc VLDB Endow20158992293310.14778/2777598.2777601 GarofalakisMGionisARastogiRSechadriSShimKXTRACT: learning document type descriptors from XML document collectionsData Min Knowl Discov2003712356197370510.1023/A:1021560618289 BatageljVBrenMComparing resemblance measuresJ Classif19951217390134945310.1007/BF012022680833.62054 Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1998). ACM, pp 295–306 Hegewald J, Naumann F, Weis M (2006) XStruct: efficient schema extraction from multiple and large XML documents. In: Proceedings of the 22nd international conference on data engineering workshops, ICDE 2006, 3–7 Apr 2006, Atlanta, p 81 Izquierdo JLC, Cabot J (July 8-12, 2013) Discovering implicit schemas in JSON data. In: Web engineering—13th international conference, ICWE 2013, Aalborg, Proceedings, 2013, pp 68–83 Moh D-H, Lim E-P, Ng W-K (2000) DTD-miner: a tool for mining DTD from XML documents. In: Second international workshop on advance issues of E-commerce and web-based information systems (WECWIS 2000). IEEE Computer Society, pp 144–151 Estivill-Castro V, Yang J (2000) Fast and robust general purpose clustering algorithms. In: Proceedings of 6th Pacific Rim international conference on artificial intelligence (PRICAI 2000), LNCS, vol 1886. Springer, pp 208–218 GuerriniGMesitiMSanzIAkaliAPallisGAn overview of similarity measures for clustering XML documentsEmerging techniques and technologies: web data management practices2007HersheyIGI Global567810.4018/978-1-59904-228-2.ch003 LeonovAVKhusnutdinovRRStudy and development of the DTD generation system for XML documentsProgram Comput Softw200531419721010.1007/s11086-005-0032-61103.68479 LianWCheungDMamoulisNYiuS-MAn efficient and scalable algorithm for clustering XML documents by structureIEEE Trans Knowl Data Eng2004161829610.1109/TKDE.2004.1264824 Abiteboul S (1997) Querying semi-structured data. In: Proceedings of 6th international conference on database theory (ICDT’97), LNCS, vol 1186. Springer, pp 1–18 GallinucciEGolfarelliMRizziSSchema profiling of document-oriented databasesInform Syst201875132510.1016/j.is.2018.02.007 Klettke M, Störl U, Scherzinger S (2015) Schema extraction and structural outlier detection for JSON-based NOSQL data stores. In: Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Proceedings, pp 425–444 E Bertino (88_CR2) 2004; 29 88_CR22 W Lian (88_CR27) 2004; 16 88_CR21 Z Zhang (88_CR24) 1989; 18 E Gallinucci (88_CR28) 2018; 75 GJ Bex (88_CR16) 2010; 4 J Widom (88_CR11) 1999; 22 R Nayak (88_CR10) 2007; 20 88_CR25 M Garofalakis (88_CR7) 2003; 7 J Albert (88_CR6) 2001; 267 T Dalamagas (88_CR23) 2006; 31 J-K Min (88_CR20) 2003; 85 S Abiteboul (88_CR3) 2000 V Batagelj (88_CR26) 1995; 12 88_CR31 88_CR30 88_CR17 88_CR1 88_CR18 L Wang (88_CR4) 2015; 8 88_CR15 88_CR13 88_CR5 88_CR14 88_CR8 G Guerrini (88_CR12) 2007 AV Leonov (88_CR19) 2005; 31 88_CR9 (88_CR29) 2003 |
References_xml | – reference: WidomJData management for XML: research directionsIEEE Data Eng Bull19992234452 – reference: NayakRIryadiWXML schema clustering with semantic and hierarchical similarity measuresKnowl Based Syst200720433634910.1016/j.knosys.2006.08.006 – reference: GarofalakisMGionisARastogiRSechadriSShimKXTRACT: learning document type descriptors from XML document collectionsData Min Knowl Discov2003712356197370510.1023/A:1021560618289 – reference: BertinoEGuerriniGMesitiMA matching algorithm for measuring the structural similarity between an XML document and a DTD and its applicationsInf Syst2004291234610.1016/S0306-4379(03)00031-0 – reference: Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1998). ACM, pp 295–306 – reference: Izquierdo JLC, Cabot J (July 8-12, 2013) Discovering implicit schemas in JSON data. In: Web engineering—13th international conference, ICWE 2013, Aalborg, Proceedings, 2013, pp 68–83 – reference: Sanz I, Pérez J, Berlanga R, Aramburu M (2003) XML schemata inference and evolution. In: Proceedings of 14th international conference on databases and expert systems applications (DEXA’03), LNCS, vol 2736. Springer, pp 109–118 – reference: ZhangZShashaDSimple fast algorithms for the editing distance between trees and related problemsSIAM J Comput198918612451262102547210.1137/02180820692.68047 – reference: Moh D-H, Lim E-P, Ng W-K (2000) DTD-miner: a tool for mining DTD from XML documents. In: Second international workshop on advance issues of E-commerce and web-based information systems (WECWIS 2000). IEEE Computer Society, pp 144–151 – reference: BexGJGeladeWNevenFVansummerenSLearning deterministic regular expressions for the inference of schemas from XML dataACM Trans Web20104414:114:3210.1145/1841909.1841911 – reference: Boobna U, de Rougemont M (2004) Correctors for XML data. In: Proceedings of 2nd international XML database symposium (XSYM’04), LNCS, vol 3186. Springer, pp 97–111 – reference: Abiteboul S (1997) Querying semi-structured data. In: Proceedings of 6th international conference on database theory (ICDT’97), LNCS, vol 1186. Springer, pp 1–18 – reference: AbiteboulSBunemanPSuciuDData on the Web–from relations to semistructured data and XML2000BurlingtonMorgan Kaufmann – reference: GuerriniGMesitiMSanzIAkaliAPallisGAn overview of similarity measures for clustering XML documentsEmerging techniques and technologies: web data management practices2007HersheyIGI Global567810.4018/978-1-59904-228-2.ch003 – reference: LianWCheungDMamoulisNYiuS-MAn efficient and scalable algorithm for clustering XML documents by structureIEEE Trans Knowl Data Eng2004161829610.1109/TKDE.2004.1264824 – reference: W3C, Extensible Markup Language (XML) 1.0, 3rd Edition (February 2004) – reference: Teege G (1994) Making the difference: a substraction operation for description logics. In: Proceedings of the international conference on principles of knowledge representation and reasoning (KR’94). Morgan Kaufmann, pp 540–550 – reference: Hegewald J, Naumann F, Weis M (2006) XStruct: efficient schema extraction from multiple and large XML documents. In: Proceedings of the 22nd international conference on data engineering workshops, ICDE 2006, 3–7 Apr 2006, Atlanta, p 81 – reference: Jung J-S, Oh D-I, Kong Y-H, Ahn J-K (2002) Extracting information from XML documents by reverse generating a DTD. In: Proceedings of the EurAsia-ICT 2002, LNCS, vol 2510. Springer, pp 314–321 – reference: MinJ-KAhnJ-YCungC-WEfficient extraction of schemas for XML documentsInform Process Lett200385712195015610.1016/S0020-0190(02)00345-91042.68040 – reference: AlbertJGiammarresiDWoodDNormal form algorithms for extended context-free grammarsTheor Comput Sci20012671–23547185565310.1016/S0304-3975(00)00294-20984.68092 – reference: Wang K, Liu H (1997) Schema discovery for semistructured data. In: 3rd International conference on knowledge discovery and data mining (KDD-97), pp 271–274 – reference: Klettke M, Störl U, Scherzinger S (2015) Schema extraction and structural outlier detection for JSON-based NOSQL data stores. In: Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Proceedings, pp 425–444 – reference: BatageljVBrenMComparing resemblance measuresJ Classif19951217390134945310.1007/BF012022680833.62054 – reference: LeonovAVKhusnutdinovRRStudy and development of the DTD generation system for XML documentsProgram Comput Softw200531419721010.1007/s11086-005-0032-61103.68479 – reference: GallinucciEGolfarelliMRizziSSchema profiling of document-oriented databasesInform Syst201875132510.1016/j.is.2018.02.007 – reference: Estivill-Castro V, Yang J (2000) Fast and robust general purpose clustering algorithms. In: Proceedings of 6th Pacific Rim international conference on artificial intelligence (PRICAI 2000), LNCS, vol 1886. Springer, pp 208–218 – reference: DalamagasTChengTWinkelK-JSellisTA methodology for clustering XML documents by structureInform Syst20063118722810.1016/j.is.2004.11.0091128.68345 – reference: WangLHassanzadehOZhangSShiJJiaoLZouJWangCSchema management for document storesProc VLDB Endow20158992293310.14778/2777598.2777601 – reference: BaaderFCalvaneseDMcGuinnessDNardiDPatel-SchneiderPThe description logic handbook2003CambridgeCambridge University Press1274.68451 – reference: Moh D-H, Lim E-P, Ng W-K (2000) Re-engineering structures from Web documents. In: 5th ACM conference on digital libraries (DL 2000). ACM, pp 67–76 – ident: 88_CR25 – volume-title: Data on the Web–from relations to semistructured data and XML year: 2000 ident: 88_CR3 – ident: 88_CR31 doi: 10.1007/3-540-44533-1_24 – volume: 7 start-page: 23 issue: 1 year: 2003 ident: 88_CR7 publication-title: Data Min Knowl Discov doi: 10.1023/A:1021560618289 – volume-title: The description logic handbook year: 2003 ident: 88_CR29 – volume: 18 start-page: 1245 issue: 6 year: 1989 ident: 88_CR24 publication-title: SIAM J Comput doi: 10.1137/0218082 – volume: 75 start-page: 13 year: 2018 ident: 88_CR28 publication-title: Inform Syst doi: 10.1016/j.is.2018.02.007 – volume: 20 start-page: 336 issue: 4 year: 2007 ident: 88_CR10 publication-title: Knowl Based Syst doi: 10.1016/j.knosys.2006.08.006 – volume: 8 start-page: 922 issue: 9 year: 2015 ident: 88_CR4 publication-title: Proc VLDB Endow doi: 10.14778/2777598.2777601 – volume: 85 start-page: 7 year: 2003 ident: 88_CR20 publication-title: Inform Process Lett doi: 10.1016/S0020-0190(02)00345-9 – volume: 31 start-page: 197 issue: 4 year: 2005 ident: 88_CR19 publication-title: Program Comput Softw doi: 10.1007/s11086-005-0032-6 – ident: 88_CR22 doi: 10.1007/978-3-540-30081-6_8 – ident: 88_CR18 – volume: 4 start-page: 14:1 issue: 4 year: 2010 ident: 88_CR16 publication-title: ACM Trans Web doi: 10.1145/1841909.1841911 – volume: 22 start-page: 44 issue: 3 year: 1999 ident: 88_CR11 publication-title: IEEE Data Eng Bull – volume: 267 start-page: 35 issue: 1–2 year: 2001 ident: 88_CR6 publication-title: Theor Comput Sci doi: 10.1016/S0304-3975(00)00294-2 – ident: 88_CR9 doi: 10.1007/978-3-540-45227-0_12 – ident: 88_CR14 doi: 10.1109/ICDEW.2006.166 – ident: 88_CR1 doi: 10.1007/3-540-62222-5_33 – volume: 12 start-page: 73 issue: 1 year: 1995 ident: 88_CR26 publication-title: J Classif doi: 10.1007/BF01202268 – ident: 88_CR30 doi: 10.1016/B978-1-4832-1452-8.50145-7 – volume: 16 start-page: 82 issue: 1 year: 2004 ident: 88_CR27 publication-title: IEEE Trans Knowl Data Eng doi: 10.1109/TKDE.2004.1264824 – ident: 88_CR13 – start-page: 56 volume-title: Emerging techniques and technologies: web data management practices year: 2007 ident: 88_CR12 doi: 10.4018/978-1-59904-228-2.ch003 – volume: 31 start-page: 187 year: 2006 ident: 88_CR23 publication-title: Inform Syst doi: 10.1016/j.is.2004.11.009 – ident: 88_CR5 – ident: 88_CR21 doi: 10.1007/978-3-642-39200-9_8 – ident: 88_CR17 – ident: 88_CR15 – volume: 29 start-page: 23 issue: 1 year: 2004 ident: 88_CR2 publication-title: Inf Syst doi: 10.1016/S0306-4379(03)00031-0 – ident: 88_CR8 doi: 10.1145/276304.276331 |
SSID | ssj0000613680 |
Score | 2.0564473 |
Snippet | The WWW contains a huge amount of documents. Some of them share the same subject, but are generated by different people or even by different organizations. A... |
SourceID | hal csuc proquest crossref springer |
SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 87 |
SubjectTerms | Algorithms Artificial Intelligence Automatic data collection systems Classificació automàtica Computer Science Data mining Database Management Design Document Information Storage and Retrieval Information Systems Applications (incl.Internet) Informàtica IT in Business Mineria de dades Original Article Sistemes d'informació XML Àrees temàtiques de la UPC |
Title | Approximating the Schema of a Set of Documents by Means of Resemblance |
URI | https://link.springer.com/article/10.1007/s13740-018-0088-0 https://www.proquest.com/docview/2052868119 https://recercat.cat/handle/2072/330669 https://hal.science/hal-01971563 |
Volume | 7 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bT9swFD4a5QUegHHROi6yEE9DQbGT-vJYppVqG7wMJHiyEtsBtFImmiLg13NOmrSAtkk8NGoTx3Lt43zfiY-_A7BnVOx5keaRcSGJUh13SPI2i5Dbuzg3DiGZVnSPT2T_LP1-3jmv93GPmmj3ZkmyelLPNrslikIRuY4Qt_AwB_NIP-K0BfPdo4sfs1crBFGyypnGteQR5Qhv1jP_Vs8rRGq50dghzlxRWOQLzvlmmbRCn94ynDbtngSd_D4Yl_mBe3oj6fjOP7YCSzUbZd2J-XyED2G4CstNpgdWT_xVWHwhW7gGvS4JkT9cE9kdXjKkkFiSxF_ZbcEy9iuU9AXha1ztn2P5IzsOCIl0liL9bvIB2do6nPW-nX7tR3U-hsilOikjX3ATCqG90jKYLE4czubAveEFOTocXSkhCo0MKOVeZVmuOpmRWgceuFJBJhvQGt4OwydghXdeIJV0IUP_0AmTepl4XeDIIeXTpg1xMybW1WLllDNjYGcyy9RrFnvNUq_ZuA1fprf8mSh1_L8wDrRFVAl3ListqWxPf9BHxErYBB0qiY3ZRXOYVkpF-92fls4hQVboASf3vA1bjbXY-jEwwko6QkvNOdax3wz-7PI_m_f5XaU3YUFU1kPvhragVd6NwzZSpTLfwanROzw82amnyDPD5QUj |
linkProvider | Springer Nature |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Pb9MwFH6C7gA7MCigdQywpp1AmWIn9Y9jhSiFtruslcbJSmxnILZuWtMJ-Ot5L03abgKkHRq1iWO59nO-78XP3wM4NCr2vEjzyLiQRKmOuyR5m0XI7V2cG4eQTCu642M5mKZfTrun9T7ueRPt3ixJVk_q9Wa3RFEoItcR4hYeHsJWii44mvVW79PX4frVCkGUrHKmcS15RDnCm_XMv9VzC5Fabr5wiDPfKCxyg3PeWSat0Ke_A5Om3cugkx9HizI_cr_vSDre8489hSc1G2W9pfk8gwdh1oadJtMDqyd-G7Y3ZAufQ79HQuQ_vxPZnZ0xpJBYksRf2WXBMnYSSvqC8LWo9s-x_BcbB4REOkuRfhf5OdnaC5j2P04-DKI6H0PkUp2UkS-4CYXQXmkZTBYnDmdz4N7wghwdjq6UEIVGBpRyr7IsV93MSK0DD1ypIJOX0JpdzsIusMI7L5BKupChf-iESb1MvC5w5JDyadOBuBkT62qxcsqZcW7XMsvUaxZ7zVKv2bgD71a3XC2VOv5fGAfaIqqEa5eVllS2Vz_oI2IlbIIOlcTGHKA5rCqlooPeyNI5JMgKPeDkhndgv7EWWz8G5lhJV2ipOcc63jeDv778z-bt3av0W3g0mIxHdvT5ePgKHovKkug90T60yutFeI20qczf1NPkD_2oBpY |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Zb9QwEB6VrYTgoRcgtpRiIZ5AaWMn6-NxBWyXXkKCSuXJJD4AUdKqm0Vtf31ndpPdUgES4iFRDsdK7Em-b-LxNwAvjEo9j3mZGBeyJNdpjyRviwS5vUtL4xCSaUT34FAOj_Ld495xk-d01Ea7t0OS0zkNpNJU1dtnPm7PJ75lisISuU4Qw3B1BxZzkrbrwGJ_59Pe_DcLwZWc5E_jWvKE8oW3Y5u_q-cXdOq40dgh5nylEMkb_PPWkOkEiQbL8Ll9hmkAyvetcV1uuatb8o7_8ZArsNSwVNafmtUqLIRqDZbbDBCs-SCswf0bcoYPYNAngfKLb0SCqy8MqSWWJFFYdhpZwT6EmjYQ1saTeXWsvGQHAaGSjlIE4I_yhGzwIRwN3n58PUyaPA2Jy3VWJz5yE6LQXmkZTJFmDt_ywL3hkRwgji6WEFEjM8q5V0VRql5hpNaBB65UkNkj6FSnVXgMLHrnBVJMFwr0G50wuZeZ1xF7EamgNl1I2_6xrhExp1waJ3Yuv0ytZrHVLLWaTbvwcnbJ2VTB4--FsdMtok04d0VtSX17tkOLSJWwGTpaEm_mOZrGrFIqOuzvWzqGxFmhZ5z95F3YaC3HNp-HEVbSE1pqzrGOV60hzE__8fbW_6n0M7j7_s3A7r873HsC98TEkOj30QZ06vNxeIpsqi43mzfmGrupD3o |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Approximating+the+Schema+of+a+Set+of+Documents+by+Means+of+Resemblance&rft.jtitle=Journal+on+data+semantics&rft.au=Abell%C3%B3%2C+Alberto&rft.au=de+Palol%2C+Xavier&rft.au=Hacid%2C+Mohand-Sa%C3%AFd&rft.date=2018-06-01&rft.issn=1861-2032&rft.eissn=1861-2040&rft.volume=7&rft.issue=2&rft.spage=87&rft.epage=105&rft_id=info:doi/10.1007%2Fs13740-018-0088-0&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s13740_018_0088_0 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1861-2032&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1861-2032&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1861-2032&client=summon |