Fast and scalable inequality joins

Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitm...

Full description

Saved in:

Bibliographic Details
Published in	The VLDB journal Vol. 26; no. 1; pp. 125 - 150
Main Authors	Khayyat, Zuhair, Lucia, William, Singh, Meghna, Ouzzani, Mourad, Papotti, Paolo, Quiané-Ruiz, Jorge-Arnulfo, Tang, Nan, Kalnis, Panos
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2017 Springer Nature B.V
Subjects	Algorithms Arrays Computer Science Database Management Optimization Optimization techniques Queries Selectivity Special Issue Paper Incremental Selectivity estimation Inequality join PostgreSQL Spark SQL
Online Access	Get full text
ISSN	1066-8888 0949-877X
DOI	10.1007/s00778-016-0441-6

Cover

Abstract	Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster.
AbstractList	Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster. Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster.
Author	Tang, Nan Ouzzani, Mourad Khayyat, Zuhair Papotti, Paolo Lucia, William Quiané-Ruiz, Jorge-Arnulfo Kalnis, Panos Singh, Meghna
Author_xml	– sequence: 1 givenname: Zuhair orcidid: 0000-0003-3650-6997 surname: Khayyat fullname: Khayyat, Zuhair email: zuhair.khayyat@kaust.edu.sa organization: King Abdullah University of Science and Technology – sequence: 2 givenname: William surname: Lucia fullname: Lucia, William organization: Qatar Computing Research Institute, HBKU – sequence: 3 givenname: Meghna surname: Singh fullname: Singh, Meghna organization: Qatar Computing Research Institute, HBKU – sequence: 4 givenname: Mourad surname: Ouzzani fullname: Ouzzani, Mourad organization: Qatar Computing Research Institute, HBKU – sequence: 5 givenname: Paolo orcidid: 0000-0003-0651-4128 surname: Papotti fullname: Papotti, Paolo organization: Arizona State University – sequence: 6 givenname: Jorge-Arnulfo surname: Quiané-Ruiz fullname: Quiané-Ruiz, Jorge-Arnulfo organization: Qatar Computing Research Institute, HBKU – sequence: 7 givenname: Nan surname: Tang fullname: Tang, Nan organization: Qatar Computing Research Institute, HBKU – sequence: 8 givenname: Panos surname: Kalnis fullname: Kalnis, Panos organization: King Abdullah University of Science and Technology
BookMark	eNp9kEFLAzEQhYNUsK3-AG-LnqMz2d0ke5Riq1DwouAtJNlUUtZsm2wP_femrAcRdA4zl_fNm3kzMgl9cIRcI9whgLhPuQlJATmFqkLKz8gUmqqhUoj3CZkicE5lrgsyS2kLAIyxekpuljoNhQ5tkazutOlc4YPbH3Tnh2Ox7X1Il-R8o7vkrr7nnLwtH18XT3T9snpePKypLZEPtEJRa4QKNsZwVjNh6rqssBWG27asGJQWmQNhDGuZ4Y2QrERXa-FQcqdtOSe3495d7PcHlwa17Q8xZEuFUub3sGlkVuGosrFPKbqN2kX_qeNRIahTFGqMQuUo1CkKxTMjfjHWD3rwfRii9t2_JBvJlF3Ch4s_bvoT-gJi0XIa
CitedBy_id	crossref_primary_10_1007_s00778_023_00788_y crossref_primary_10_14778_3494124_3494146 crossref_primary_10_1016_j_jcss_2021_09_004 crossref_primary_10_14778_3192965_3192966 crossref_primary_10_1587_transfun_2023EAP1135 crossref_primary_10_1007_s00778_019_00590_9 crossref_primary_10_1007_s11390_018_1872_x crossref_primary_10_1109_TCE_2023_3249292 crossref_primary_10_14778_3565816_3565828 crossref_primary_10_14778_3476249_3476306 crossref_primary_10_1016_j_is_2024_102435 crossref_primary_10_1007_s00778_021_00692_3 crossref_primary_10_1007_s00778_020_00650_5 crossref_primary_10_1137_22M1534468 crossref_primary_10_14778_3099622_3099629 crossref_primary_10_14778_3401960_3401965
Cites_doi	10.1145/1327452.1327492 10.1145/2723372.2747646 10.1007/3-540-48482-5_7 10.4018/987-1-59904-364-7.ch007 10.1145/1739041.1739056 10.1109/ICDE.2007.367920 10.1007/s00778-003-0111-3 10.1145/67544.66937 10.1145/304182.304201 10.1145/276304.276336 10.1145/2463676.2465327 10.1145/2588555.2594511 10.1145/1292609.1292616 10.1145/1529282.1529582 10.1145/1007568.1007645 10.1145/602259.602266 10.1145/503099.503101 10.1145/2723372.2742797 10.1145/1142473.1142511 10.1016/j.pmcj.2013.10.001 10.1145/1989323.1989423 10.1145/582095.582099 10.1007/978-3-642-82375-6_2
ContentType	Journal Article
Copyright	Springer-Verlag Berlin Heidelberg 2016 Copyright Springer Science & Business Media 2017
Copyright_xml	– notice: Springer-Verlag Berlin Heidelberg 2016 – notice: Copyright Springer Science & Business Media 2017
DBID	AAYXX CITATION
DOI	10.1007/s00778-016-0441-6
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	0949-877X
EndPage	150
ExternalDocumentID	10_1007_s00778_016_0441_6
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 123 1N0 1SB 2.D 203 29R 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 3-Y 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AAKMM AALFJ AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAWTV AAYFX AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACM ACMDZ ACMLO ACOKC ACOMO ACPIV ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADL ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEBYY AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AENSD AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWIH AFWTZ AFWXC AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BGNMA BSONS CAG CCLIF COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EDO EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GUFHI GXS H13 HF~ HG5 HG6 HGAVV HMJXF HQYDN HRMNR HVGLF HZ~ I07 I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAS LHSKQ LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P0- P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RIG RNI RNS ROL RPX RSV RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VXZ W23 W48 W7O WK8 YLTOR YZZ Z45 Z7R Z7X Z83 Z88 Z8M Z8R Z8W Z92 ZMTXR ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG AEFXT AEJOY AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP AKRVB ATHPR AYFIA CITATION ABRTQ
ID	FETCH-LOGICAL-c316t-4175a1040fbb62527b55341d7b6cd34203c12e07bb2d2b6978231e5a7e186eac3
IEDL.DBID	AGYKE
ISSN	1066-8888
IngestDate	Fri Jul 25 10:16:08 EDT 2025 Tue Jul 01 01:59:45 EDT 2025 Thu Apr 24 23:04:21 EDT 2025 Fri Feb 21 02:37:43 EST 2025
IsPeerReviewed	false
IsScholarly	true
Issue	1
Keywords	Incremental Selectivity estimation Inequality join PostgreSQL Spark SQL
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c316t-4175a1040fbb62527b55341d7b6cd34203c12e07bb2d2b6978231e5a7e186eac3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0003-3650-6997 0000-0003-0651-4128
PQID	1880771998
PQPubID	2043708
PageCount	26
ParticipantIDs	proquest_journals_1880771998 crossref_primary_10_1007_s00778_016_0441_6 crossref_citationtrail_10_1007_s00778_016_0441_6 springer_journals_10_1007_s00778_016_0441_6
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2017-02-01
PublicationDateYYYYMMDD	2017-02-01
PublicationDate_xml	– month: 02 year: 2017 text: 2017-02-01 day: 01
PublicationDecade	2010
PublicationPlace	Berlin/Heidelberg
PublicationPlace_xml	– name: Berlin/Heidelberg – name: New York
PublicationSubtitle	The International Journal on Very Large Data Bases
PublicationTitle	The VLDB journal
PublicationTitleAbbrev	The VLDB Journal
PublicationYear	2017
Publisher	Springer Berlin Heidelberg Springer Nature B.V
Publisher_xml	– name: Springer Berlin Heidelberg – name: Springer Nature B.V
References	Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: VLDB, pp. 562–573 (1995) Chu, X., Ilyas, I.F., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013) Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007) Böhm, C., Klump, G., Kriegel, H.-P.: XZ-Ordering: A space-filling curve for objects with spatial extension. In: SSD, pp. 75–90 (1999) KhayyatZLuciaWSinghMOuzzaniMPapottiPQuiané-RuizJ-ATangNKalnisPLightning fast and space efficient inequality joinsPVLDB201581320742085 Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems. Pearson Education (2009) Laurila, J.K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.-M.-T., Dousse, O., Eberle, J., Miettinen, M.: The mobile data challenge: big data for mobile computing research. In: Pervasive Computing (2012) Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: SIGMOD, pp. 325–336 (2006) Agrawal, D., Chawla, S., Elmagarmid, A.K., Ouzzani, Z.K.M., Papotti, P., Quiané-Ruiz, J., Tang, N., Zaki, M.J.: Road to freedom in big data analytics. In: EDBT, pp. 479–484 (2016) MamoulisNPapadiasDMultiway spatial joinsTODS200126442447510.1145/503099.5031011136.68388 Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F. Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013) DeWitt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. In: VLDB, pp. 443–452 (1991) Schneider, D.A., DeWitt, D.J.: A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In: SIGMOD (1989) AbiteboulSHullRVianuVFoundations of Databases1995ReadingAddison-Wesley0848.68031 Chan, C.-Y., Ioannidis, Y. E.: Bitmap index design and evaluation. In: SIGMOD, pp. 355–366 (1998) Bender, M.A., Hu, H.: An adaptive packed-memory array. TODS 32(4) 26:1–26:43 (2007) Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979) DeanJGhemawatSMapReduce: Simplified data processing on large clustersCommun. ACM200851110711310.1145/1327452.1327492 Morris, J., Ramesh, B.: Dynamic Partition Enhanced Inequality Joining Using a Value-count Index, 1 2011. US Patent 7,873,629 B1 EbaidAElmagarmidAKIlyasIFOuzzaniMQuiané-RuizJTangNYinSNADEEF: a generalized data cleaning systemPVLDB201361212181221 Knuth, D. E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud, pp. 10–10 (2010) GaoDJensenCSSnodgrassRTSooMDJoin operations in temporal databasesVLDB J.200514122910.1007/s00778-003-0111-3 Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J., Tang, N., Yin, S.: NADEEF/ER: generic and interactive entity resolution. In: SIGMOD, pp. 1071–1074 (2014) Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394 (2015) Kiukkonen, N., Blom, J., Dousse, O., Gatica-Perez, D., Laurila, J.: Towards rich mobile phone datasets: lausanne data collection campaign. In: ICPS (2010) Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: VLDB, pp. 30–41 (1999) Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Quiané-Ruiz, J.-A., Tang, N., Yin, S.: BigDansing: a system for big data cleansing. In: SIGMOD, pp. 1215–1230 (2015) Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984) Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT, pp. 99–110 (2010) ZhangXChenLWangMEfficient multi-way theta-join processing using MapReducePVLDB201251111841195 Lohman, G., Mohan, C., Haas, L., Daniels, D., Lindsay, B., Selinger, P., Wilms, P.: Query processing in R*. In: Query Processing in Database Systems, pp. 31–47 (1985) Lopes Siqueira, T.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: A spatial bitmap-based index for geographical data warehouses. In: SAC, pp. 1336–1342 (2009) Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011) Enderle, J., Hampel, M., Seidl, T.: Joining interval data in relational databases. In: SIGMOD, pp. 683–694 (2004) Stockinger, K., Wu, K.: Bitmap indices for data warehouses. Data Wareh OLAP Concepts Archit Solut 5, 157–178 (2007) Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: SIGMOD, pp. 215–226 (1999) DittrichJQuiané-RuizJJindalAKarginYSettyVSchadJHadoop++: making a yellow elephant run like a cheetah (without it even noticing)PVLDB201031515529 441_CR9 441_CR20 441_CR7 441_CR22 441_CR8 441_CR21 441_CR5 441_CR24 441_CR6 441_CR23 441_CR3 A Ebaid (441_CR15) 2013; 6 441_CR4 441_CR2 D Gao (441_CR18) 2005; 14 Z Khayyat (441_CR25) 2015; 8 S Abiteboul (441_CR1) 1995 J Dean (441_CR12) 2008; 51 441_CR37 441_CR36 441_CR17 441_CR16 441_CR19 441_CR30 441_CR11 441_CR33 441_CR10 441_CR32 441_CR13 441_CR35 N Mamoulis (441_CR31) 2001; 26 441_CR34 441_CR26 441_CR28 J Dittrich (441_CR14) 2010; 3 441_CR27 441_CR29 X Zhang (441_CR38) 2012; 5
References_xml	– reference: Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems. Pearson Education (2009) – reference: GaoDJensenCSSnodgrassRTSooMDJoin operations in temporal databasesVLDB J.200514122910.1007/s00778-003-0111-3 – reference: Laurila, J.K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.-M.-T., Dousse, O., Eberle, J., Miettinen, M.: The mobile data challenge: big data for mobile computing research. In: Pervasive Computing (2012) – reference: Enderle, J., Hampel, M., Seidl, T.: Joining interval data in relational databases. In: SIGMOD, pp. 683–694 (2004) – reference: Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011) – reference: Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud, pp. 10–10 (2010) – reference: Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: VLDB, pp. 562–573 (1995) – reference: Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F. Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013) – reference: EbaidAElmagarmidAKIlyasIFOuzzaniMQuiané-RuizJTangNYinSNADEEF: a generalized data cleaning systemPVLDB201361212181221 – reference: DeanJGhemawatSMapReduce: Simplified data processing on large clustersCommun. ACM200851110711310.1145/1327452.1327492 – reference: Lopes Siqueira, T.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: A spatial bitmap-based index for geographical data warehouses. In: SAC, pp. 1336–1342 (2009) – reference: Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT, pp. 99–110 (2010) – reference: Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979) – reference: Morris, J., Ramesh, B.: Dynamic Partition Enhanced Inequality Joining Using a Value-count Index, 1 2011. US Patent 7,873,629 B1 – reference: Agrawal, D., Chawla, S., Elmagarmid, A.K., Ouzzani, Z.K.M., Papotti, P., Quiané-Ruiz, J., Tang, N., Zaki, M.J.: Road to freedom in big data analytics. In: EDBT, pp. 479–484 (2016) – reference: Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J., Tang, N., Yin, S.: NADEEF/ER: generic and interactive entity resolution. In: SIGMOD, pp. 1071–1074 (2014) – reference: Stockinger, K., Wu, K.: Bitmap indices for data warehouses. Data Wareh OLAP Concepts Archit Solut 5, 157–178 (2007) – reference: Bender, M.A., Hu, H.: An adaptive packed-memory array. TODS 32(4) 26:1–26:43 (2007) – reference: Knuth, D. E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973) – reference: Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: SIGMOD, pp. 325–336 (2006) – reference: KhayyatZLuciaWSinghMOuzzaniMPapottiPQuiané-RuizJ-ATangNKalnisPLightning fast and space efficient inequality joinsPVLDB201581320742085 – reference: DittrichJQuiané-RuizJJindalAKarginYSettyVSchadJHadoop++: making a yellow elephant run like a cheetah (without it even noticing)PVLDB201031515529 – reference: AbiteboulSHullRVianuVFoundations of Databases1995ReadingAddison-Wesley0848.68031 – reference: Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Quiané-Ruiz, J.-A., Tang, N., Yin, S.: BigDansing: a system for big data cleansing. In: SIGMOD, pp. 1215–1230 (2015) – reference: Schneider, D.A., DeWitt, D.J.: A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In: SIGMOD (1989) – reference: Böhm, C., Klump, G., Kriegel, H.-P.: XZ-Ordering: A space-filling curve for objects with spatial extension. In: SSD, pp. 75–90 (1999) – reference: DeWitt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. In: VLDB, pp. 443–452 (1991) – reference: Chan, C.-Y., Ioannidis, Y. E.: Bitmap index design and evaluation. In: SIGMOD, pp. 355–366 (1998) – reference: Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: SIGMOD, pp. 215–226 (1999) – reference: Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007) – reference: Kiukkonen, N., Blom, J., Dousse, O., Gatica-Perez, D., Laurila, J.: Towards rich mobile phone datasets: lausanne data collection campaign. In: ICPS (2010) – reference: Lohman, G., Mohan, C., Haas, L., Daniels, D., Lindsay, B., Selinger, P., Wilms, P.: Query processing in R*. In: Query Processing in Database Systems, pp. 31–47 (1985) – reference: ZhangXChenLWangMEfficient multi-way theta-join processing using MapReducePVLDB201251111841195 – reference: Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394 (2015) – reference: MamoulisNPapadiasDMultiway spatial joinsTODS200126442447510.1145/503099.5031011136.68388 – reference: Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: VLDB, pp. 30–41 (1999) – reference: Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984) – reference: Chu, X., Ilyas, I.F., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013) – volume: 5 start-page: 1184 issue: 11 year: 2012 ident: 441_CR38 publication-title: PVLDB – ident: 441_CR37 – volume: 51 start-page: 107 issue: 1 year: 2008 ident: 441_CR12 publication-title: Commun. ACM doi: 10.1145/1327452.1327492 – volume: 6 start-page: 1218 issue: 12 year: 2013 ident: 441_CR15 publication-title: PVLDB – ident: 441_CR24 doi: 10.1145/2723372.2747646 – volume: 3 start-page: 515 issue: 1 year: 2010 ident: 441_CR14 publication-title: PVLDB – ident: 441_CR7 doi: 10.1007/3-540-48482-5_7 – ident: 441_CR22 – ident: 441_CR36 doi: 10.4018/987-1-59904-364-7.ch007 – ident: 441_CR2 doi: 10.1145/1739041.1739056 – ident: 441_CR6 doi: 10.1109/ICDE.2007.367920 – volume: 8 start-page: 2074 issue: 13 year: 2015 ident: 441_CR25 publication-title: PVLDB – volume: 14 start-page: 2 issue: 1 year: 2005 ident: 441_CR18 publication-title: VLDB J. doi: 10.1007/s00778-003-0111-3 – ident: 441_CR34 doi: 10.1145/67544.66937 – ident: 441_CR26 – ident: 441_CR9 doi: 10.1145/304182.304201 – ident: 441_CR8 doi: 10.1145/276304.276336 – ident: 441_CR10 – ident: 441_CR11 doi: 10.1145/2463676.2465327 – ident: 441_CR16 doi: 10.1145/2588555.2594511 – ident: 441_CR5 doi: 10.1145/1292609.1292616 – ident: 441_CR13 – ident: 441_CR30 doi: 10.1145/1529282.1529582 – ident: 441_CR17 doi: 10.1145/1007568.1007645 – ident: 441_CR19 – ident: 441_CR21 doi: 10.1145/602259.602266 – volume: 26 start-page: 424 issue: 4 year: 2001 ident: 441_CR31 publication-title: TODS doi: 10.1145/503099.503101 – ident: 441_CR4 doi: 10.1145/2723372.2742797 – ident: 441_CR20 doi: 10.1145/1142473.1142511 – ident: 441_CR3 – ident: 441_CR28 doi: 10.1016/j.pmcj.2013.10.001 – ident: 441_CR33 doi: 10.1145/1989323.1989423 – ident: 441_CR27 – ident: 441_CR23 – ident: 441_CR32 – ident: 441_CR35 doi: 10.1145/582095.582099 – ident: 441_CR29 doi: 10.1007/978-3-642-82375-6_2 – volume-title: Foundations of Databases year: 1995 ident: 441_CR1
SSID	ssj0002225
Score	2.2516298
Snippet	Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	125
SubjectTerms	Algorithms Arrays Computer Science Database Management Optimization Optimization techniques Queries Selectivity Special Issue Paper
Title	Fast and scalable inequality joins
URI	https://link.springer.com/article/10.1007/s00778-016-0441-6 https://www.proquest.com/docview/1880771998
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5se_FifWK1liCelC3JJrtJj600FsWeWqinsJtswAepmPSgv97ZvNSiQs_ZLMnM7sz3MS-ACwTJlhKmJINI2cRhShEhmEditIWRxb2Y5lNL7qd8MnduF2xR1nGnVbZ7FZLMLXVd7KY7z-jEK2TA6MMJb0CLWd7Aa0JrePNwN64NsKYweZCTc4IEz6uCmb9t8tMdfWHMtbBo7m38Nsyq7yySTJ77q0z2w4-1Fo4b_sgu7JTo0xgWx2UPtlSyD-1qsoNRXvQDOPdFmhkiiYwUdairqwxEo0UB5rvxtHxM0kOY--PZ9YSU0xRIaFs8Iw4CBYHky4ylRNJDXckYurDIlTyMbIeadmhRZbpS0ohKjuwSoZ9iwlWWx9E820fQTJaJOgaDmTENGeo4joXj2p5wqIqZRo90oAlTB8xKqEFYthrXEy9egrpJci6DQKeXaRkEvAOX9SuvRZ-N_xZ3K00F5ZVLA91YznV1yWAHrirBf3v812YnG60-hW2qHXuet92FZva2UmcISzLZw2Poj0bTXnkce9CY0-EnNvjWog
linkProvider	Springer Nature
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7ofNAX7-J0ahCflECbNmn3OMQxddvTBnsLSZuCIp3Y-uC_9yS9eEEFn5sGetKc7_s4N4ALJMm-UZ6m_dQENOTGUKV4TDP0hakv4oy5qSWTqRjNw7sFX9R13EWT7d6EJJ2nbovdbOcZm3iFChgxnIpVWEMuENuxBXM2aN2vFTAuxCkERXkXN6HMn7b4CkYfDPNbUNRhzXAbNmuSSAbVqe7Aisl3YasZwEDq-7gH50NVlETlKSnQ1LYIiiBprOok38jj8iEv9mE-vJldj2g99IAmgS9KGiKeK9RIXqY1ahMWac4RadJIiyQNQuYFic-MF2nNUqYFikBkaIaryPixQC8aHEAnX-bmEAj3MpZwPIosU2EUxCpkJuOW5LG-1TVd8Jqvl0ndEdwOpniSbS9jZzBps8CswaTowmX7ynPVDuOvxb3GpLK-GYW0_d-iyFb2deGqMfOnx79tdvSv1WewPppNxnJ8O70_hg1msdilWvegU768mhNkEqU-dX_OO_tVulc
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8QwFH7oCOLFXRwdNYgnJUybNmnnOKhl3AYPDsytJG0CinQGWw_-e1-6uaCC56aBvibv-z7eBnCCJNnV0lF0kGqP-lxrKiUPqUFfmLoiNKycWnI3FqOJfz3l03rOad5kuzchyaqmwXZpyor-PDX9tvDNdqGxSViohhHPqViEJfTGrj3oEzZsXbEVM2W4UwiKUi9swpo_bfEVmD7Y5rcAaYk70Tqs1oSRDKs_vAELOtuEtWYYA6nv5hYcRzIviMxSkqPZbUEUQQJZ1Uy-kafZY5ZvwyS6fDgf0XoAAk08VxTUR2yXqJccoxTqFBYozhF10kCJJPV85niJy7QTKMVSpgQKQmRrmstAu6FAj-rtQCebZXoXCHcMSzj-FmOkH3ih9Jk23BI-NrAapwtO8_VxUncHt0MqnuO2r3FpsNhmhFmDxaILp-0r86o1xl-Le41J4_qW5LHtBRcEtsqvC2eNmT89_m2zvX-tPoLl-4sovr0a3-zDCrOwXGZd96BTvLzqAyQVhTosD847U4u-kw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fast+and+scalable+inequality+joins&rft.jtitle=The+VLDB+journal&rft.au=Khayyat%2C+Zuhair&rft.au=Lucia%2C+William&rft.au=Singh%2C+Meghna&rft.au=Ouzzani%2C+Mourad&rft.date=2017-02-01&rft.pub=Springer+Berlin+Heidelberg&rft.issn=1066-8888&rft.eissn=0949-877X&rft.volume=26&rft.issue=1&rft.spage=125&rft.epage=150&rft_id=info:doi/10.1007%2Fs00778-016-0441-6&rft.externalDocID=10_1007_s00778_016_0441_6
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon