Robust and Scalable Content-and-Structure Indexing
Frequent queries on semi-structured hierarchical data are Content-and-Structure (CAS) queries that filter data items based on their location in the hierarchical structure and their value for some attribute. We propose the Robust and Scalable Content-and-Structure (RSCAS) index to efficiently answer...
Saved in:
Published in | The VLDB journal |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Springer
2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Frequent queries on semi-structured hierarchical data are Content-and-Structure (CAS) queries that filter data items based on their location in the hierarchical structure and their value for some attribute. We propose the Robust and Scalable Content-and-Structure (RSCAS) index to efficiently answer CAS queries on big semi-structured data. To get an index that is robust against queries with varying selectivities we introduce a novel dynamic interleaving that merges the path and value dimensions of composite keys in a balanced manner. We store interleaved keys in our triebased RSCAS index, which efficiently supports a wide range of CAS queries, including queries with wildcards and descendant axes. We implement RSCAS as a log-structured merge (LSM) tree to scale it to data-intensive applications with a high insertion rate. We illustrate RSCAS's robustness and scalability by indexing data from the Software Heritage (SWH) archive, which is the world's largest, publiclyavailable source code archive. |
---|---|
AbstractList | Frequent queries on semi-structured hierarchical data are Content-and-Structure (CAS) queries that filter data items based on their location in the hierarchical structure and their value for some attribute. We propose the Robust and Scalable Content-and-Structure (RSCAS) index to efficiently answer CAS queries on big semi-structured data. To get an index that is robust against queries with varying selectivities we introduce a novel dynamic interleaving that merges the path and value dimensions of composite keys in a balanced manner. We store interleaved keys in our triebased RSCAS index, which efficiently supports a wide range of CAS queries, including queries with wildcards and descendant axes. We implement RSCAS as a log-structured merge (LSM) tree to scale it to data-intensive applications with a high insertion rate. We illustrate RSCAS's robustness and scalability by indexing data from the Software Heritage (SWH) archive, which is the world's largest, publiclyavailable source code archive. |
Author | Zacchiroli, Stefano Böhlen, Michael H. Helmer, Sven Wellenzohn, Kevin Pietri, Antoine |
Author_xml | – sequence: 1 givenname: Kevin surname: Wellenzohn fullname: Wellenzohn, Kevin organization: Department of Informatics [Zurich] – sequence: 2 givenname: Michael H. surname: Böhlen fullname: Böhlen, Michael H. organization: Department of Informatics [Zurich] – sequence: 3 givenname: Sven surname: Helmer fullname: Helmer, Sven organization: Department of Informatics [Zurich] – sequence: 4 givenname: Antoine surname: Pietri fullname: Pietri, Antoine organization: Direction générale déléguée à l'innovation – sequence: 5 givenname: Stefano orcidid: 0000-0002-4576-136X surname: Zacchiroli fullname: Zacchiroli, Stefano organization: Laboratoire Traitement et Communication de l'Information |
BackLink | https://hal.science/hal-03787268$$DView record in HAL |
BookMark | eNqVisEKwiAAQCUWtFX_4LWD4FypO8YoFnRqHbqJ26wWpqEu6u9b0A_0Lg8eLwGRsUaNQIzzZY44Y6cIxCmmFPGBCUi8v2GMCSGrGJCDrXsfoDQtrBqpZa0VLKwJygQ0RFQF1zehdwruTKtenbnMwPgstVfzn6dgsd0cixJdpRYP192lewsrO1Gu9-LbcMY4I5Q_0-yf9wPIWTrl |
ContentType | Journal Article |
Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | 1XC VOOES |
DatabaseName | Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 0949-877X |
ExternalDocumentID | oai_HAL_hal_03787268v1 |
GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 123 1N0 1SB 1XC 2.D 203 29R 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 3-Y 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAEOY AAGNY AAHNG AAIAL AAJBT AAJKR AAKMM AALFJ AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAWTV AAYFX AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACM ACMDZ ACMLO ACOKC ACOMO ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADL ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEBYY AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AENSD AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWIH AFWTZ AFWXC AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BGNMA CAG CCLIF COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EDO EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GUFHI GXS H13 HF~ HG5 HG6 HGAVV HMJXF HQYDN HRMNR HVGLF HZ~ I07 I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAS LHSKQ LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P0- P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VOOES W23 W48 W7O WK8 YLTOR YZZ Z45 Z7R Z7X Z83 Z88 Z8M Z8R Z8W Z92 ZMTXR ~EX |
ID | FETCH-hal_primary_oai_HAL_hal_03787268v13 |
ISSN | 1066-8888 |
IngestDate | Thu Oct 24 06:48:57 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-hal_primary_oai_HAL_hal_03787268v13 |
ORCID | 0000-0002-4576-136X 0000-0002-4576-136X |
OpenAccessLink | https://hal.science/hal-03787268 |
ParticipantIDs | hal_primary_oai_HAL_hal_03787268v1 |
PublicationCentury | 2000 |
PublicationDate | 2022 |
PublicationDateYYYYMMDD | 2022-01-01 |
PublicationDate_xml | – year: 2022 text: 2022 |
PublicationDecade | 2020 |
PublicationTitle | The VLDB journal |
PublicationYear | 2022 |
Publisher | Springer |
Publisher_xml | – name: Springer |
SSID | ssj0002225 |
Score | 4.7143836 |
Snippet | Frequent queries on semi-structured hierarchical data are Content-and-Structure (CAS) queries that filter data items based on their location in the... |
SourceID | hal |
SourceType | Open Access Repository |
SubjectTerms | Computer Science Software Engineering |
Title | Robust and Scalable Content-and-Structure Indexing |
URI | https://hal.science/hal-03787268 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na8IwFA_D0y77HvsmjF2GRGJsa3usc1I3J7Lq8FZsjVQZ7WCdB__6vSRN60CY2yWUtKRJfuHl5b2830PozqRsakaUErs-gwOKbc2IYzsRqfM6a_LQnISWiHd-6VveyHgam-MyN5-MLsnCWrTaGFfyH1ShDnAVUbJ_QLZoFCrgGfCFEhCGciuMX9Pw61NdEfdhrmUUlKSbSjIClcSX5LDCRdAVpIh6l1qUK-St125V1_8lnTQivcoqjRMVuLOcFwuoJfzqLSt-5-t37qterTSp6nQs_rIMMhvMRd6unKsg1Z783NbAyjOptjKuSUrQVQgcn5Xw5LlN0XBAvDbH5eaiHeqe6weDdifodfvPP98WJNee2wtiwIA2QIgwy15KTgIQOiClRswt9ldxQpU-7LwDoBXE2goutYLhAdrL1XnsKmwO0Q5PjtC-TpWBc8l5jJiCCgMqWEOFN0KFNVQn6L7zOHzwiOjthyIFCTaPoHGKKkma8DOE6TQC3dimbGbItG9OZNgsooJcyJhOTOcc3f7e3sU2H12iXQGeMhRdoQoMgF-D6pSFN3ImvwEkoiEn |
link.rule.ids | 230,315,783,787,888,4031 |
linkProvider | Springer Nature |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Robust+and+Scalable+Content-and-Structure+Indexing&rft.jtitle=The+VLDB+journal&rft.au=Wellenzohn%2C+Kevin&rft.au=B%C3%B6hlen%2C+Michael+H.&rft.au=Helmer%2C+Sven&rft.au=Pietri%2C+Antoine&rft.date=2022&rft.pub=Springer&rft.issn=1066-8888&rft.eissn=0949-877X&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai_HAL_hal_03787268v1 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon |