Machine learning system for extracting structured records from web pages and other text sources

A method for extracting a structured record ( 190 ) from a document ( 100 ) is described where the the structured record includes information related to a predetermined subject matter ( 120 ), with this information being organized into categories within the structured record. The method comprises th...

Full description

Saved in:
Bibliographic Details
Main Authors BAXTER JONATHAN, SEYMORE KRISTIE
Format Patent
LanguageEnglish
Published 08.06.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A method for extracting a structured record ( 190 ) from a document ( 100 ) is described where the the structured record includes information related to a predetermined subject matter ( 120 ), with this information being organized into categories within the structured record. The method comprises the steps of identifying a span of text ( 130 ) in the document ( 100 ) according to criteria associated with the predetermined subject matter and processing ( 150 ) the span of text to extract at least one text element associated with at least one of the categories of the structured record ( 190 ) from the document ( 100 ).
Bibliography:Application Number: US20050291740