Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns
Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electroni...
Saved in:
Published in | Advances in Information Systems pp. 96 - 106 |
---|---|
Main Authors | , |
Format | Book Chapter Conference Proceeding |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
01.01.2004
Springer |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783540234784 3540234780 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-540-30198-1_11 |
Cover
Abstract | Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure among multiple documents for the process of handling documents. In this paper, we propose a new methodology for extracting common structures from XML documents and finding maximal similar paths between structures using sequential pattern mining algorithms. Correct determination of common structures between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our adapted sequential pattern mining algorithms can find common structures and maximal similar paths between them exactly. |
---|---|
AbstractList | Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure among multiple documents for the process of handling documents. In this paper, we propose a new methodology for extracting common structures from XML documents and finding maximal similar paths between structures using sequential pattern mining algorithms. Correct determination of common structures between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our adapted sequential pattern mining algorithms can find common structures and maximal similar paths between them exactly. |
Author | Park, Seung-Soo Lee, Jung-Won |
Author_xml | – sequence: 1 givenname: Jung-Won surname: Lee fullname: Lee, Jung-Won email: jungwony@ewha.ac.kr organization: Dept. of Computer Science and Engineering, Ewha Womans University, Seoul, Korea – sequence: 2 givenname: Seung-Soo surname: Park fullname: Park, Seung-Soo email: sspark@ewha.ac.kr organization: Dept. of Computer Science and Engineering, Ewha Womans University, Seoul, Korea |
BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16368325$$DView record in Pascal Francis |
BookMark | eNotkMFOwzAMhgMMiW3sDTj0wjEQ12mTHGEwQNoE0pjELUrbdBS6dDSdgLcn3cglsf39lvKNyMA1zhJyAewKGBPXSkiKNOGMIgMlKWiAIzLC0Nk3-DEZQgpAEbk6IZPA97MYuZB8QIaBiqkSHM_IxPsP1h8lJfAhWcwqV1RuHS3MT7UxdbSsNlVt2ujFdO8-urXdt7UuelvMo7sm322s63y08n1iab92oaxCKMCdbZ0_J6elqb2d_N9jsprdv04f6fz54Wl6M6fbOJYdTUuZGQFlyYvEpswWqJTNwAiesFRlFhIurVC5wJKjEYU1PMtzCO8k5pInOCaXh71b43NTl61xeeX1tg1faH81pJhKjHsuPnA-jNzatjprmk-vgelerA6iNOqgSu896l4s_gGYzmfa |
ContentType | Book Chapter Conference Proceeding |
Copyright | Springer-Verlag Berlin Heidelberg 2004 2005 INIST-CNRS |
Copyright_xml | – notice: Springer-Verlag Berlin Heidelberg 2004 – notice: 2005 INIST-CNRS |
DBID | IQODW |
DOI | 10.1007/978-3-540-30198-1_11 |
DatabaseName | Pascal-Francis |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science Applied Sciences |
EISBN | 3540301984 9783540301981 |
EISSN | 1611-3349 |
Editor | Yakhno, Tatyana |
Editor_xml | – sequence: 1 givenname: Tatyana surname: Yakhno fullname: Yakhno, Tatyana email: yakhno@cs.deu.edu.tr |
EndPage | 106 |
ExternalDocumentID | 16368325 |
GroupedDBID | -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE ALMA_UNASSIGNED_HOLDINGS EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02 IQODW |
ID | FETCH-LOGICAL-p228t-6f8ba71ff4d5e60ed399eb1a745069be1548e79c73f43a7dea4bcc143a5248453 |
ISBN | 9783540234784 3540234780 |
ISSN | 0302-9743 |
IngestDate | Mon Sep 16 09:38:54 EDT 2024 Tue Jul 29 19:43:11 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Database query Information system XML language Document type definition Data mining Optimization Document processing Electronic document management World wide web Document structure Internet Database management system Indexing |
Language | English |
License | CC BY 4.0 |
LinkModel | OpenURL |
MeetingName | ADVIS 2004 : advances in information systems (Izmir, 20-22 October 2004) |
MergedId | FETCHMERGED-LOGICAL-p228t-6f8ba71ff4d5e60ed399eb1a745069be1548e79c73f43a7dea4bcc143a5248453 |
PageCount | 11 |
ParticipantIDs | pascalfrancis_primary_16368325 springer_books_10_1007_978_3_540_30198_1_11 |
PublicationCentury | 2000 |
PublicationDate | 2004-01-01 |
PublicationDateYYYYMMDD | 2004-01-01 |
PublicationDate_xml | – month: 01 year: 2004 text: 2004-01-01 day: 01 |
PublicationDecade | 2000 |
PublicationPlace | Berlin, Heidelberg |
PublicationPlace_xml | – name: Berlin, Heidelberg – name: Berlin |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSubtitle | Third International Conference, ADVIS 2004, Izmir, Turkey, October 20-22, 2004. Proceedings |
PublicationTitle | Advances in Information Systems |
PublicationYear | 2004 |
Publisher | Springer Berlin Heidelberg Springer |
Publisher_xml | – name: Springer Berlin Heidelberg – name: Springer |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Nierstrasz, Oscar Tygar, Dough Steffen, Bernhard Kittler, Josef Vardi, Moshe Y. Weikum, Gerhard Sudan, Madhu Naor, Moni Mitchell, John C. Terzopoulos, Demetri Pandu Rangan, C. Kanade, Takeo Hutchison, David |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, UK – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, UK – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, CA, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: Oscar surname: Nierstrasz fullname: Nierstrasz, Oscar organization: University of Bern, Switzerland – sequence: 9 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology, Madras, India – sequence: 10 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: University of Dortmund, Germany – sequence: 11 givenname: Madhu surname: Sudan fullname: Sudan, Madhu organization: Massachusetts Institute of Technology, MA, USA – sequence: 12 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: New York University, NY, USA – sequence: 13 givenname: Dough surname: Tygar fullname: Tygar, Dough organization: University of California, Berkeley, USA – sequence: 14 givenname: Moshe Y. surname: Vardi fullname: Vardi, Moshe Y. organization: Rice University, Houston, USA – sequence: 15 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max-Planck Institute of Computer Science, Saarbruecken, Germany |
SSID | ssj0000098814 ssj0002792 |
Score | 1.734369 |
Snippet | Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on... |
SourceID | pascalfrancis springer |
SourceType | Index Database Publisher |
StartPage | 96 |
SubjectTerms | Applied sciences Computer science; control theory; systems Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software |
Title | Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns |
URI | http://link.springer.com/10.1007/978-3-540-30198-1_11 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT9swFLe6ctk48LFNgwHygVuVqUkcOz4yBEKIokqFrTfPcRypB1q0phLaX7_3bKdNC0KCS1SlVeM8__z8Pn8m5JRXIrGCmcjyoh8xZliUlyaJpJbc5jrWmSNJGtzyq3t2Pc7Gnc6fVtXSoi5-mH8v9pW8Z1bhHswrdsm-YWaXfwo34DPML1xhhuG6Yfyuh1l9ebHP3rt61tBU5Fd_i4O8VWkDizr6vcq5D0ON9MjiF6PZrI2dy4nvdRnop8kD0oVMHiZYrToEc3He-xlKu8aDG7C_zcI3yfnag5GrzK4xDD90zJ0hGIjisHMYjM9Y3M5qP_DmUIlGx6wFIdhGEKIJQvZe4egKMaYkZcIfCte0bYFKBqfGaznrtTBHbsXUc5kGzSp5a4-OHUvBc_W_WfEB2kuCk6yw-fsDPLZLts4urm9-LaNwSKeao4ES9m6kU_R5Jz-o0A3kBh0YwlYv0erEfOmRWGar57DSKn9EyrNcuzNh7nbJNra1UOw3AXnvkY6d7pOdRv40yH-ffGoxVH4mgwAFGqBAAxSogwINUKAABbqEAnVQoCso0AYKX8j95cXd-VUUTuCIHpMkryNe5YUWcVWxMrO8b0swZ2Fz14JlfS4Li_6uFdKItGKpFqXVrDAGTHCdJSxnWfqVdKezqf1GqJZaSyPjkmnw-HkuLRiPZT-2mAqvhDwgJ2vSUo-ebUWBw8Bh28kOSK8Rn8JFN1cN4zYIX6UKhK-c8BUK__BNv_5OPq5QfUS69d-FPQZjsy5OAmL-A-ELdT8 |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Advances+in+Information+Systems&rft.au=Lee%2C+Jung-Won&rft.au=Park%2C+Seung-Soo&rft.atitle=Finding+Maximal+Similar+Paths+Between+XML+Documents+Using+Sequential+Patterns&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2004-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783540234784&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=96&rft.epage=106&rft_id=info:doi/10.1007%2F978-3-540-30198-1_11 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon |