80 New Packages to Mine Database Query Logs
The query log of a DBMS is a powerful resource. It enables many practical applications, including query optimization and user experience enhancement. And yet, mining SQL queries is a difficult task. The fundamental problem is that queries are symbolic objects, not vectors of numbers. Therefore, many...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.03.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The query log of a DBMS is a powerful resource. It enables many practical
applications, including query optimization and user experience enhancement. And
yet, mining SQL queries is a difficult task. The fundamental problem is that
queries are symbolic objects, not vectors of numbers. Therefore, many popular
statistical concepts, such as means, regression, or decision trees do not
apply. Most authors limit themselves to ad hoc algorithms or approaches based
on neighborhoods, such as k Nearest Neighbors. Our project is to challenge this
limitation. We introduce methods to manipulate SQL queries as if they were
vectors, thereby unlocking the whole statistical toolbox. We present three
families of methods: feature maps, kernel methods, and Bayesian models. The
first technique directly encodes queries into vectors. The second one
transforms the queries implicitly. The last one exploits probabilistic
graphical models as an alternative to vector spaces. We present the benefits
and drawbacks of each solution, highlight how they relate to each other, and
make the case for future investigation. |
---|---|
AbstractList | The query log of a DBMS is a powerful resource. It enables many practical
applications, including query optimization and user experience enhancement. And
yet, mining SQL queries is a difficult task. The fundamental problem is that
queries are symbolic objects, not vectors of numbers. Therefore, many popular
statistical concepts, such as means, regression, or decision trees do not
apply. Most authors limit themselves to ad hoc algorithms or approaches based
on neighborhoods, such as k Nearest Neighbors. Our project is to challenge this
limitation. We introduce methods to manipulate SQL queries as if they were
vectors, thereby unlocking the whole statistical toolbox. We present three
families of methods: feature maps, kernel methods, and Bayesian models. The
first technique directly encodes queries into vectors. The second one
transforms the queries implicitly. The last one exploits probabilistic
graphical models as an alternative to vector spaces. We present the benefits
and drawbacks of each solution, highlight how they relate to each other, and
make the case for future investigation. |
Author | Sellam, Thibault Kersten, Martin |
Author_xml | – sequence: 1 givenname: Thibault surname: Sellam fullname: Sellam, Thibault – sequence: 2 givenname: Martin surname: Kersten fullname: Kersten, Martin |
BackLink | https://doi.org/10.48550/arXiv.1703.08732$$DView paper in arXiv |
BookMark | eNotzj1PwzAQgGEPZWhLfwBTvaOEc84XmxGVjyKFAlL36ByfqwhIUFI--u8RhendXj0zNen6TpQ6M5BbTwQXPHy3n7lxgDl4h8VUnXvQG_nST9y88E5Gve_1Q9uJvuY9Bx5FP3_IcNBVvxtP1Uni11EW_52r7e3NdrXOqse7-9VVlXHpisxTQutTMEAAtpQmkrEeE6VAkUUwgoXgQuNiGZEcFSGUTbjEwhrD4HGuln_bo7Z-H9o3Hg71r7o-qvEHKKo8GQ |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.1703.08732 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1703_08732 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a672-85f348fb1050046ecd51483f5fb5daee3d040b7bc7d6d35752bb6cb932411a083 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:45:07 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a672-85f348fb1050046ecd51483f5fb5daee3d040b7bc7d6d35752bb6cb932411a083 |
OpenAccessLink | https://arxiv.org/abs/1703.08732 |
ParticipantIDs | arxiv_primary_1703_08732 |
PublicationCentury | 2000 |
PublicationDate | 2017-03-25 |
PublicationDateYYYYMMDD | 2017-03-25 |
PublicationDate_xml | – month: 03 year: 2017 text: 2017-03-25 day: 25 |
PublicationDecade | 2010 |
PublicationYear | 2017 |
Score | 1.6603158 |
SecondaryResourceType | preprint |
Snippet | The query log of a DBMS is a powerful resource. It enables many practical
applications, including query optimization and user experience enhancement. And
yet,... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Databases |
Title | 80 New Packages to Mine Database Query Logs |
URI | https://arxiv.org/abs/1703.08732 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEA61Jy-iqNQnOXiT4G6e26OotYj1ARV6KzN5iAhW2q3ov3eyW9GL18wQmISZ-YZkvmHshCqEfuhrLax1TmjoRwHaR4FSg0UFFZrc4Dy6s8MnfTMxkw7jP70wMP98-Wj5gXFxVrpMQFo5RUF2Tcr8Zev6ftI-TjZUXCv9Xz3CmM3SnyQx2GQbK3THz9vr2GKd-LbNTquCUyzhD-BfyXsXvJ7xEYE7fgk15CzCH5dx_sVvZ8-LHTYeXI0vhmI1o0CAdRRLTFK6SkgoJVea0QcCIJVKJqEJEKMK5CTo0LtggyJoJBGtRwJNuiyB4M8u61KZH3uMpyJ52klJICHohN6UGAuJVcjDMdIe6zWWTd9bGoppNnraGL3_v-iArcuciAolpDlk3Xq-jEeURms8bs7yG9EPcIQ |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=80+New+Packages+to+Mine+Database+Query+Logs&rft.au=Sellam%2C+Thibault&rft.au=Kersten%2C+Martin&rft.date=2017-03-25&rft_id=info:doi/10.48550%2Farxiv.1703.08732&rft.externalDocID=1703_08732 |