Machine learning system for extracting structured records from web pages and other text sources
A method for extracting a structured record ( 190 ) from a document ( 100 ) is described where the the structured record includes information related to a predetermined subject matter ( 120 ), with this information being organized into categories within the structured record. The method comprises th...
Saved in:
Main Authors | , |
---|---|
Format | Patent |
Language | English |
Published |
08.06.2006
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A method for extracting a structured record ( 190 ) from a document ( 100 ) is described where the the structured record includes information related to a predetermined subject matter ( 120 ), with this information being organized into categories within the structured record. The method comprises the steps of identifying a span of text ( 130 ) in the document ( 100 ) according to criteria associated with the predetermined subject matter and processing ( 150 ) the span of text to extract at least one text element associated with at least one of the categories of the structured record ( 190 ) from the document ( 100 ). |
---|---|
Bibliography: | Application Number: US20050291740 |