Simulating the Webgraph: a comparative analysis of models

The Webgraph is the directed graph produced by the World Wide Web's hyperlinked structure: its nodes are static html pages, and its edges are the hyperlinks between two pages. Since the early '90s, the Web has grown exponentially - a trend we expect will continue. Today's Webgraph has...

Full description

Saved in:
Bibliographic Details
Published inComputing in science & engineering Vol. 6; no. 6; pp. 84 - 89
Main Authors Donato, D., Laura, L., Leonardi, S., Millozzi, S.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2004
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The Webgraph is the directed graph produced by the World Wide Web's hyperlinked structure: its nodes are static html pages, and its edges are the hyperlinks between two pages. Since the early '90s, the Web has grown exponentially - a trend we expect will continue. Today's Webgraph has several billion edges, but in spite of its size, it exhibits a well-defined structure characterized by several properties. In the past few years, several research papers have reported these properties and proposed various random graph models. We simulated several of these models and compared them against a 300-million-node sample of the Webgraph provided by the Stanford WebBase project (http://www-diglib.stanford.edu//spl sim/testbed/doc2/WebBase/). All the software we developed to perform this comparison is free to download from the European Research Project COSIN Web site (www.cosin.org). Over the past six years, computer scientists, economists, mathematicians, and physicists have extensively studied the Webgraph's properties. All this research was motivated primarily by the need to efficiently mine the huge quantities of information on the Web - information that is often distributed among several pages. The first major discovery concerned in-degree, an intuitive and simplistic measure of page importance. (Each node in a directed graph is characterized by in-degree and out-degree - the number of incoming and outgoing links, respectively).
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1521-9615
1558-366X
DOI:10.1109/MCSE.2004.73