WEB HISTORY ARCHIVE SYSTEM AND METHOD FOR WEB PAGES MANAGEMENT

A web archive system for managing webpages and a method thereof are provided to reduce overhead for crawling the webpages by collecting the webpages based on Stanford WebBase, and maximize efficiency of a storage space and search by using an RCS(Revision Control System). A storage manager(30) stores...

Full description

Saved in:
Bibliographic Details
Main Authors JANG, CHANG BOK, CHO, SUNG HOON, CHOI, EUI IN, LEE, MOO HUN
Format Patent
LanguageEnglish
Published 03.07.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A web archive system for managing webpages and a method thereof are provided to reduce overhead for crawling the webpages by collecting the webpages based on Stanford WebBase, and maximize efficiency of a storage space and search by using an RCS(Revision Control System). A storage manager(30) stores, updates, and manages webpages collected by a crawler of WebBase. A storage(40) physically stores the webpage stored/managed by the storage manager. A VCS(Version Control System) module extracts catalog information of the inserted webpage by receiving the webpage deleted by update of the storage manager and arranges the webpages in timestamp order. A plurality of VAS(Version Assignment System) modules(120) calculates a storage position by comparing the webpage received from the VCS module with an NIT(Node information table)(131) of each node and assigns a version by determining the update. An RCS module(121) effectively compresses and stores a history page in the VAS module. A history storage(130) physically stores a web history page of which the version and a changed value are calculated by the VCS and VAS modules. The NIT stores the information needed for the update and version assignment by storing history information of the webpage stored by the VAS module.
Bibliography:Application Number: KR20060136316