Query generation using structural similarity between documents
Methods, systems, and apparatus, including computer program products, for generating synthetic queries using seed queries and structural similarity between documents are described. In one aspect, a method includes identifying embedded coding fragments (e.g., HTML tag) from a structured document and...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Patent |
Language | English |
Published |
28.07.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Methods, systems, and apparatus, including computer program products, for generating synthetic queries using seed queries and structural similarity between documents are described. In one aspect, a method includes identifying embedded coding fragments (e.g., HTML tag) from a structured document and a seed query; generating one or more query templates, each query template corresponding to at least one coding fragment, the query template including a generative rule to be used in generating candidate synthetic queries; generating the candidate synthetic queries by applying the query templates to other documents that are hosted on the same web site as the document; identifying terms that match structure of the query templates as candidate synthetic queries; measuring a performance for each of the candidate synthetic queries; and designating as synthetic queries the candidate synthetic queries that have performance measurements exceeding a performance threshold. |
---|---|
Bibliography: | Application Number: US201213620500 |