Automatically extracting by-line information

A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system con...

Full description

Saved in:
Bibliographic Details
Main Authors DILL STEPHEN, KORUPOLU MADHUKAR R, TOMKINS ANDREW S
Format Patent
LanguageEnglish
Published 27.11.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system constructs the set of potential headlines based on the title meta-tag. The system selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The system extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline.
Bibliography:Application Number: US20080192917