Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns

Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electroni...

Full description

Saved in:
Bibliographic Details
Published inAdvances in Information Systems pp. 96 - 106
Main Authors Lee, Jung-Won, Park, Seung-Soo
Format Book Chapter Conference Proceeding
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 01.01.2004
Springer
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783540234784
3540234780
ISSN0302-9743
1611-3349
DOI10.1007/978-3-540-30198-1_11

Cover

Abstract Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure among multiple documents for the process of handling documents. In this paper, we propose a new methodology for extracting common structures from XML documents and finding maximal similar paths between structures using sequential pattern mining algorithms. Correct determination of common structures between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our adapted sequential pattern mining algorithms can find common structures and maximal similar paths between them exactly.
AbstractList Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure among multiple documents for the process of handling documents. In this paper, we propose a new methodology for extracting common structures from XML documents and finding maximal similar paths between structures using sequential pattern mining algorithms. Correct determination of common structures between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our adapted sequential pattern mining algorithms can find common structures and maximal similar paths between them exactly.
Author Park, Seung-Soo
Lee, Jung-Won
Author_xml – sequence: 1
  givenname: Jung-Won
  surname: Lee
  fullname: Lee, Jung-Won
  email: jungwony@ewha.ac.kr
  organization: Dept. of Computer Science and Engineering, Ewha Womans University, Seoul, Korea
– sequence: 2
  givenname: Seung-Soo
  surname: Park
  fullname: Park, Seung-Soo
  email: sspark@ewha.ac.kr
  organization: Dept. of Computer Science and Engineering, Ewha Womans University, Seoul, Korea
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16368325$$DView record in Pascal Francis
BookMark eNotkMFOwzAMhgMMiW3sDTj0wjEQ12mTHGEwQNoE0pjELUrbdBS6dDSdgLcn3cglsf39lvKNyMA1zhJyAewKGBPXSkiKNOGMIgMlKWiAIzLC0Nk3-DEZQgpAEbk6IZPA97MYuZB8QIaBiqkSHM_IxPsP1h8lJfAhWcwqV1RuHS3MT7UxdbSsNlVt2ujFdO8-urXdt7UuelvMo7sm322s63y08n1iab92oaxCKMCdbZ0_J6elqb2d_N9jsprdv04f6fz54Wl6M6fbOJYdTUuZGQFlyYvEpswWqJTNwAiesFRlFhIurVC5wJKjEYU1PMtzCO8k5pInOCaXh71b43NTl61xeeX1tg1faH81pJhKjHsuPnA-jNzatjprmk-vgelerA6iNOqgSu896l4s_gGYzmfa
ContentType Book Chapter
Conference Proceeding
Copyright Springer-Verlag Berlin Heidelberg 2004
2005 INIST-CNRS
Copyright_xml – notice: Springer-Verlag Berlin Heidelberg 2004
– notice: 2005 INIST-CNRS
DBID IQODW
DOI 10.1007/978-3-540-30198-1_11
DatabaseName Pascal-Francis
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
Applied Sciences
EISBN 3540301984
9783540301981
EISSN 1611-3349
Editor Yakhno, Tatyana
Editor_xml – sequence: 1
  givenname: Tatyana
  surname: Yakhno
  fullname: Yakhno, Tatyana
  email: yakhno@cs.deu.edu.tr
EndPage 106
ExternalDocumentID 16368325
GroupedDBID -DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
ALMA_UNASSIGNED_HOLDINGS
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RIG
RNI
RSU
SVGTG
VI1
~02
IQODW
ID FETCH-LOGICAL-p228t-6f8ba71ff4d5e60ed399eb1a745069be1548e79c73f43a7dea4bcc143a5248453
ISBN 9783540234784
3540234780
ISSN 0302-9743
IngestDate Mon Sep 16 09:38:54 EDT 2024
Tue Jul 29 19:43:11 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords Database query
Information system
XML language
Document type definition
Data mining
Optimization
Document processing
Electronic document management
World wide web
Document structure
Internet
Database management system
Indexing
Language English
License CC BY 4.0
LinkModel OpenURL
MeetingName ADVIS 2004 : advances in information systems (Izmir, 20-22 October 2004)
MergedId FETCHMERGED-LOGICAL-p228t-6f8ba71ff4d5e60ed399eb1a745069be1548e79c73f43a7dea4bcc143a5248453
PageCount 11
ParticipantIDs pascalfrancis_primary_16368325
springer_books_10_1007_978_3_540_30198_1_11
PublicationCentury 2000
PublicationDate 2004-01-01
PublicationDateYYYYMMDD 2004-01-01
PublicationDate_xml – month: 01
  year: 2004
  text: 2004-01-01
  day: 01
PublicationDecade 2000
PublicationPlace Berlin, Heidelberg
PublicationPlace_xml – name: Berlin, Heidelberg
– name: Berlin
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSubtitle Third International Conference, ADVIS 2004, Izmir, Turkey, October 20-22, 2004. Proceedings
PublicationTitle Advances in Information Systems
PublicationYear 2004
Publisher Springer Berlin Heidelberg
Springer
Publisher_xml – name: Springer Berlin Heidelberg
– name: Springer
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Nierstrasz, Oscar
Tygar, Dough
Steffen, Bernhard
Kittler, Josef
Vardi, Moshe Y.
Weikum, Gerhard
Sudan, Madhu
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Pandu Rangan, C.
Kanade, Takeo
Hutchison, David
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
  organization: Lancaster University, UK
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
  organization: Carnegie Mellon University, Pittsburgh, USA
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
  organization: University of Surrey, Guildford, UK
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
  organization: Cornell University, Ithaca, USA
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
  organization: ETH Zurich, Switzerland
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
  organization: Stanford University, CA, USA
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
  organization: Weizmann Institute of Science, Rehovot, Israel
– sequence: 8
  givenname: Oscar
  surname: Nierstrasz
  fullname: Nierstrasz, Oscar
  organization: University of Bern, Switzerland
– sequence: 9
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
  organization: Indian Institute of Technology, Madras, India
– sequence: 10
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
  organization: University of Dortmund, Germany
– sequence: 11
  givenname: Madhu
  surname: Sudan
  fullname: Sudan, Madhu
  organization: Massachusetts Institute of Technology, MA, USA
– sequence: 12
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
  organization: New York University, NY, USA
– sequence: 13
  givenname: Dough
  surname: Tygar
  fullname: Tygar, Dough
  organization: University of California, Berkeley, USA
– sequence: 14
  givenname: Moshe Y.
  surname: Vardi
  fullname: Vardi, Moshe Y.
  organization: Rice University, Houston, USA
– sequence: 15
  givenname: Gerhard
  surname: Weikum
  fullname: Weikum, Gerhard
  organization: Max-Planck Institute of Computer Science, Saarbruecken, Germany
SSID ssj0000098814
ssj0002792
Score 1.734369
Snippet Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on...
SourceID pascalfrancis
springer
SourceType Index Database
Publisher
StartPage 96
SubjectTerms Applied sciences
Computer science; control theory; systems
Exact sciences and technology
Information systems. Data bases
Memory organisation. Data processing
Software
Title Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns
URI http://link.springer.com/10.1007/978-3-540-30198-1_11
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT9swFLe6ctk48LFNgwHygVuVqUkcOz4yBEKIokqFrTfPcRypB1q0phLaX7_3bKdNC0KCS1SlVeM8__z8Pn8m5JRXIrGCmcjyoh8xZliUlyaJpJbc5jrWmSNJGtzyq3t2Pc7Gnc6fVtXSoi5-mH8v9pW8Z1bhHswrdsm-YWaXfwo34DPML1xhhuG6Yfyuh1l9ebHP3rt61tBU5Fd_i4O8VWkDizr6vcq5D0ON9MjiF6PZrI2dy4nvdRnop8kD0oVMHiZYrToEc3He-xlKu8aDG7C_zcI3yfnag5GrzK4xDD90zJ0hGIjisHMYjM9Y3M5qP_DmUIlGx6wFIdhGEKIJQvZe4egKMaYkZcIfCte0bYFKBqfGaznrtTBHbsXUc5kGzSp5a4-OHUvBc_W_WfEB2kuCk6yw-fsDPLZLts4urm9-LaNwSKeao4ES9m6kU_R5Jz-o0A3kBh0YwlYv0erEfOmRWGar57DSKn9EyrNcuzNh7nbJNra1UOw3AXnvkY6d7pOdRv40yH-ffGoxVH4mgwAFGqBAAxSogwINUKAABbqEAnVQoCso0AYKX8j95cXd-VUUTuCIHpMkryNe5YUWcVWxMrO8b0swZ2Fz14JlfS4Li_6uFdKItGKpFqXVrDAGTHCdJSxnWfqVdKezqf1GqJZaSyPjkmnw-HkuLRiPZT-2mAqvhDwgJ2vSUo-ebUWBw8Bh28kOSK8Rn8JFN1cN4zYIX6UKhK-c8BUK__BNv_5OPq5QfUS69d-FPQZjsy5OAmL-A-ELdT8
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Advances+in+Information+Systems&rft.au=Lee%2C+Jung-Won&rft.au=Park%2C+Seung-Soo&rft.atitle=Finding+Maximal+Similar+Paths+Between+XML+Documents+Using+Sequential+Patterns&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2004-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783540234784&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=96&rft.epage=106&rft_id=info:doi/10.1007%2F978-3-540-30198-1_11
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon