80 New Packages to Mine Database Query Logs

The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet, mining SQL queries is a difficult task. The fundamental problem is that queries are symbolic objects, not vectors of numbers. Therefore, many...

Full description

Saved in:
Bibliographic Details
Main Authors Sellam, Thibault, Kersten, Martin
Format Journal Article
LanguageEnglish
Published 25.03.2017
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet, mining SQL queries is a difficult task. The fundamental problem is that queries are symbolic objects, not vectors of numbers. Therefore, many popular statistical concepts, such as means, regression, or decision trees do not apply. Most authors limit themselves to ad hoc algorithms or approaches based on neighborhoods, such as k Nearest Neighbors. Our project is to challenge this limitation. We introduce methods to manipulate SQL queries as if they were vectors, thereby unlocking the whole statistical toolbox. We present three families of methods: feature maps, kernel methods, and Bayesian models. The first technique directly encodes queries into vectors. The second one transforms the queries implicitly. The last one exploits probabilistic graphical models as an alternative to vector spaces. We present the benefits and drawbacks of each solution, highlight how they relate to each other, and make the case for future investigation.
AbstractList The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet, mining SQL queries is a difficult task. The fundamental problem is that queries are symbolic objects, not vectors of numbers. Therefore, many popular statistical concepts, such as means, regression, or decision trees do not apply. Most authors limit themselves to ad hoc algorithms or approaches based on neighborhoods, such as k Nearest Neighbors. Our project is to challenge this limitation. We introduce methods to manipulate SQL queries as if they were vectors, thereby unlocking the whole statistical toolbox. We present three families of methods: feature maps, kernel methods, and Bayesian models. The first technique directly encodes queries into vectors. The second one transforms the queries implicitly. The last one exploits probabilistic graphical models as an alternative to vector spaces. We present the benefits and drawbacks of each solution, highlight how they relate to each other, and make the case for future investigation.
Author Sellam, Thibault
Kersten, Martin
Author_xml – sequence: 1
  givenname: Thibault
  surname: Sellam
  fullname: Sellam, Thibault
– sequence: 2
  givenname: Martin
  surname: Kersten
  fullname: Kersten, Martin
BackLink https://doi.org/10.48550/arXiv.1703.08732$$DView paper in arXiv
BookMark eNotzj1PwzAQgGEPZWhLfwBTvaOEc84XmxGVjyKFAlL36ByfqwhIUFI--u8RhendXj0zNen6TpQ6M5BbTwQXPHy3n7lxgDl4h8VUnXvQG_nST9y88E5Gve_1Q9uJvuY9Bx5FP3_IcNBVvxtP1Uni11EW_52r7e3NdrXOqse7-9VVlXHpisxTQutTMEAAtpQmkrEeE6VAkUUwgoXgQuNiGZEcFSGUTbjEwhrD4HGuln_bo7Z-H9o3Hg71r7o-qvEHKKo8GQ
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.1703.08732
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 1703_08732
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a672-85f348fb1050046ecd51483f5fb5daee3d040b7bc7d6d35752bb6cb932411a083
IEDL.DBID GOX
IngestDate Mon Jan 08 05:45:07 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a672-85f348fb1050046ecd51483f5fb5daee3d040b7bc7d6d35752bb6cb932411a083
OpenAccessLink https://arxiv.org/abs/1703.08732
ParticipantIDs arxiv_primary_1703_08732
PublicationCentury 2000
PublicationDate 2017-03-25
PublicationDateYYYYMMDD 2017-03-25
PublicationDate_xml – month: 03
  year: 2017
  text: 2017-03-25
  day: 25
PublicationDecade 2010
PublicationYear 2017
Score 1.6603158
SecondaryResourceType preprint
Snippet The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet,...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Databases
Title 80 New Packages to Mine Database Query Logs
URI https://arxiv.org/abs/1703.08732
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEA61Jy-iqNQnOXiT4G6e26OotYj1ARV6KzN5iAhW2q3ov3eyW9GL18wQmISZ-YZkvmHshCqEfuhrLax1TmjoRwHaR4FSg0UFFZrc4Dy6s8MnfTMxkw7jP70wMP98-Wj5gXFxVrpMQFo5RUF2Tcr8Zev6ftI-TjZUXCv9Xz3CmM3SnyQx2GQbK3THz9vr2GKd-LbNTquCUyzhD-BfyXsXvJ7xEYE7fgk15CzCH5dx_sVvZ8-LHTYeXI0vhmI1o0CAdRRLTFK6SkgoJVea0QcCIJVKJqEJEKMK5CTo0LtggyJoJBGtRwJNuiyB4M8u61KZH3uMpyJ52klJICHohN6UGAuJVcjDMdIe6zWWTd9bGoppNnraGL3_v-iArcuciAolpDlk3Xq-jEeURms8bs7yG9EPcIQ
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=80+New+Packages+to+Mine+Database+Query+Logs&rft.au=Sellam%2C+Thibault&rft.au=Kersten%2C+Martin&rft.date=2017-03-25&rft_id=info:doi/10.48550%2Farxiv.1703.08732&rft.externalDocID=1703_08732