Automatically extracting by-line information
A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system con...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
27.11.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A by-line extraction system detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The system constructs the set of potential headlines based on the title meta-tag. The system selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The system extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline. |
---|---|
Bibliography: | Application Number: US20080192917 |