Query generation using structural similarity between documents

Methods, systems, and apparatus, including computer program products, for generating synthetic queries using seed queries and structural similarity between documents are described. In one aspect, a method includes identifying embedded coding fragments (e.g., HTML tag) from a structured document and...

Full description

Saved in:
Bibliographic Details
Main Authors VENKATACHARY SRINIVASAN, BAKER STEVEN D, HAAHR PAUL G, GUPTA NITIN, FLASTER MICHAEL, WU YONGHUI
Format Patent
LanguageEnglish
Published 28.07.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Methods, systems, and apparatus, including computer program products, for generating synthetic queries using seed queries and structural similarity between documents are described. In one aspect, a method includes identifying embedded coding fragments (e.g., HTML tag) from a structured document and a seed query; generating one or more query templates, each query template corresponding to at least one coding fragment, the query template including a generative rule to be used in generating candidate synthetic queries; generating the candidate synthetic queries by applying the query templates to other documents that are hosted on the same web site as the document; identifying terms that match structure of the query templates as candidate synthetic queries; measuring a performance for each of the candidate synthetic queries; and designating as synthetic queries the candidate synthetic queries that have performance measurements exceeding a performance threshold.
Bibliography:Application Number: US201213620500