Fast and scalable inequality joins

Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitm...

Full description

Saved in:
Bibliographic Details
Published inThe VLDB journal Vol. 26; no. 1; pp. 125 - 150
Main Authors Khayyat, Zuhair, Lucia, William, Singh, Meghna, Ouzzani, Mourad, Papotti, Paolo, Quiané-Ruiz, Jorge-Arnulfo, Tang, Nan, Kalnis, Panos
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2017
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1066-8888
0949-877X
DOI10.1007/s00778-016-0441-6

Cover

Abstract Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster.
AbstractList Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster.
Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive research ranging from efficient join algorithms such as sort-merge join, to the use of efficient indices such as B + -tree, R ∗ -tree and Bitmap. However, inequality joins have received little attention and queries containing such joins are notably very slow. In this paper, we introduce fast inequality join algorithms based on sorted arrays and space-efficient bit-arrays. We further introduce a simple method to estimate the selectivity of inequality joins which is then used to optimize multiple predicate queries and multi-way joins. Moreover, we study an incremental inequality join algorithm to handle scenarios where data keeps changing. We have implemented a centralized version of these algorithms on top of PostgreSQL, a distributed version on top of Spark SQL, and an existing data cleaning system, Nadeef . By comparing our algorithms against well-known optimization techniques for inequality joins, we show our solution is more scalable and several orders of magnitude faster.
Author Tang, Nan
Ouzzani, Mourad
Khayyat, Zuhair
Papotti, Paolo
Lucia, William
Quiané-Ruiz, Jorge-Arnulfo
Kalnis, Panos
Singh, Meghna
Author_xml – sequence: 1
  givenname: Zuhair
  orcidid: 0000-0003-3650-6997
  surname: Khayyat
  fullname: Khayyat, Zuhair
  email: zuhair.khayyat@kaust.edu.sa
  organization: King Abdullah University of Science and Technology
– sequence: 2
  givenname: William
  surname: Lucia
  fullname: Lucia, William
  organization: Qatar Computing Research Institute, HBKU
– sequence: 3
  givenname: Meghna
  surname: Singh
  fullname: Singh, Meghna
  organization: Qatar Computing Research Institute, HBKU
– sequence: 4
  givenname: Mourad
  surname: Ouzzani
  fullname: Ouzzani, Mourad
  organization: Qatar Computing Research Institute, HBKU
– sequence: 5
  givenname: Paolo
  orcidid: 0000-0003-0651-4128
  surname: Papotti
  fullname: Papotti, Paolo
  organization: Arizona State University
– sequence: 6
  givenname: Jorge-Arnulfo
  surname: Quiané-Ruiz
  fullname: Quiané-Ruiz, Jorge-Arnulfo
  organization: Qatar Computing Research Institute, HBKU
– sequence: 7
  givenname: Nan
  surname: Tang
  fullname: Tang, Nan
  organization: Qatar Computing Research Institute, HBKU
– sequence: 8
  givenname: Panos
  surname: Kalnis
  fullname: Kalnis, Panos
  organization: King Abdullah University of Science and Technology
BookMark eNp9kEFLAzEQhYNUsK3-AG-LnqMz2d0ke5Riq1DwouAtJNlUUtZsm2wP_femrAcRdA4zl_fNm3kzMgl9cIRcI9whgLhPuQlJATmFqkLKz8gUmqqhUoj3CZkicE5lrgsyS2kLAIyxekpuljoNhQ5tkazutOlc4YPbH3Tnh2Ox7X1Il-R8o7vkrr7nnLwtH18XT3T9snpePKypLZEPtEJRa4QKNsZwVjNh6rqssBWG27asGJQWmQNhDGuZ4Y2QrERXa-FQcqdtOSe3495d7PcHlwa17Q8xZEuFUub3sGlkVuGosrFPKbqN2kX_qeNRIahTFGqMQuUo1CkKxTMjfjHWD3rwfRii9t2_JBvJlF3Ch4s_bvoT-gJi0XIa
CitedBy_id crossref_primary_10_1007_s00778_023_00788_y
crossref_primary_10_14778_3494124_3494146
crossref_primary_10_1016_j_jcss_2021_09_004
crossref_primary_10_14778_3192965_3192966
crossref_primary_10_1587_transfun_2023EAP1135
crossref_primary_10_1007_s00778_019_00590_9
crossref_primary_10_1007_s11390_018_1872_x
crossref_primary_10_1109_TCE_2023_3249292
crossref_primary_10_14778_3565816_3565828
crossref_primary_10_14778_3476249_3476306
crossref_primary_10_1016_j_is_2024_102435
crossref_primary_10_1007_s00778_021_00692_3
crossref_primary_10_1007_s00778_020_00650_5
crossref_primary_10_1137_22M1534468
crossref_primary_10_14778_3099622_3099629
crossref_primary_10_14778_3401960_3401965
Cites_doi 10.1145/1327452.1327492
10.1145/2723372.2747646
10.1007/3-540-48482-5_7
10.4018/987-1-59904-364-7.ch007
10.1145/1739041.1739056
10.1109/ICDE.2007.367920
10.1007/s00778-003-0111-3
10.1145/67544.66937
10.1145/304182.304201
10.1145/276304.276336
10.1145/2463676.2465327
10.1145/2588555.2594511
10.1145/1292609.1292616
10.1145/1529282.1529582
10.1145/1007568.1007645
10.1145/602259.602266
10.1145/503099.503101
10.1145/2723372.2742797
10.1145/1142473.1142511
10.1016/j.pmcj.2013.10.001
10.1145/1989323.1989423
10.1145/582095.582099
10.1007/978-3-642-82375-6_2
ContentType Journal Article
Copyright Springer-Verlag Berlin Heidelberg 2016
Copyright Springer Science & Business Media 2017
Copyright_xml – notice: Springer-Verlag Berlin Heidelberg 2016
– notice: Copyright Springer Science & Business Media 2017
DBID AAYXX
CITATION
DOI 10.1007/s00778-016-0441-6
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 0949-877X
EndPage 150
ExternalDocumentID 10_1007_s00778_016_0441_6
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
123
1N0
1SB
2.D
203
29R
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
3-Y
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AAKMM
AALFJ
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAWTV
AAYFX
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACM
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADL
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEBYY
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AENSD
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWIH
AFWTZ
AFWXC
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
CCLIF
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EBLON
EBS
EDO
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GUFHI
GXS
H13
HF~
HG5
HG6
HGAVV
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I07
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAS
LHSKQ
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
P0-
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RIG
RNI
RNS
ROL
RPX
RSV
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VXZ
W23
W48
W7O
WK8
YLTOR
YZZ
Z45
Z7R
Z7X
Z83
Z88
Z8M
Z8R
Z8W
Z92
ZMTXR
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
ADHKG
AEFXT
AEJOY
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
AKRVB
ATHPR
AYFIA
CITATION
ABRTQ
ID FETCH-LOGICAL-c316t-4175a1040fbb62527b55341d7b6cd34203c12e07bb2d2b6978231e5a7e186eac3
IEDL.DBID AGYKE
ISSN 1066-8888
IngestDate Fri Jul 25 10:16:08 EDT 2025
Tue Jul 01 01:59:45 EDT 2025
Thu Apr 24 23:04:21 EDT 2025
Fri Feb 21 02:37:43 EST 2025
IsPeerReviewed false
IsScholarly true
Issue 1
Keywords Incremental
Selectivity estimation
Inequality join
PostgreSQL
Spark SQL
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c316t-4175a1040fbb62527b55341d7b6cd34203c12e07bb2d2b6978231e5a7e186eac3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-3650-6997
0000-0003-0651-4128
PQID 1880771998
PQPubID 2043708
PageCount 26
ParticipantIDs proquest_journals_1880771998
crossref_primary_10_1007_s00778_016_0441_6
crossref_citationtrail_10_1007_s00778_016_0441_6
springer_journals_10_1007_s00778_016_0441_6
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2017-02-01
PublicationDateYYYYMMDD 2017-02-01
PublicationDate_xml – month: 02
  year: 2017
  text: 2017-02-01
  day: 01
PublicationDecade 2010
PublicationPlace Berlin/Heidelberg
PublicationPlace_xml – name: Berlin/Heidelberg
– name: New York
PublicationSubtitle The International Journal on Very Large Data Bases
PublicationTitle The VLDB journal
PublicationTitleAbbrev The VLDB Journal
PublicationYear 2017
Publisher Springer Berlin Heidelberg
Springer Nature B.V
Publisher_xml – name: Springer Berlin Heidelberg
– name: Springer Nature B.V
References Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: VLDB, pp. 562–573 (1995)
Chu, X., Ilyas, I.F., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013)
Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007)
Böhm, C., Klump, G., Kriegel, H.-P.: XZ-Ordering: A space-filling curve for objects with spatial extension. In: SSD, pp. 75–90 (1999)
KhayyatZLuciaWSinghMOuzzaniMPapottiPQuiané-RuizJ-ATangNKalnisPLightning fast and space efficient inequality joinsPVLDB201581320742085
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems. Pearson Education (2009)
Laurila, J.K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.-M.-T., Dousse, O., Eberle, J., Miettinen, M.: The mobile data challenge: big data for mobile computing research. In: Pervasive Computing (2012)
Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: SIGMOD, pp. 325–336 (2006)
Agrawal, D., Chawla, S., Elmagarmid, A.K., Ouzzani, Z.K.M., Papotti, P., Quiané-Ruiz, J., Tang, N., Zaki, M.J.: Road to freedom in big data analytics. In: EDBT, pp. 479–484 (2016)
MamoulisNPapadiasDMultiway spatial joinsTODS200126442447510.1145/503099.5031011136.68388
Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F. Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)
DeWitt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. In: VLDB, pp. 443–452 (1991)
Schneider, D.A., DeWitt, D.J.: A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In: SIGMOD (1989)
AbiteboulSHullRVianuVFoundations of Databases1995ReadingAddison-Wesley0848.68031
Chan, C.-Y., Ioannidis, Y. E.: Bitmap index design and evaluation. In: SIGMOD, pp. 355–366 (1998)
Bender, M.A., Hu, H.: An adaptive packed-memory array. TODS 32(4) 26:1–26:43 (2007)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)
DeanJGhemawatSMapReduce: Simplified data processing on large clustersCommun. ACM200851110711310.1145/1327452.1327492
Morris, J., Ramesh, B.: Dynamic Partition Enhanced Inequality Joining Using a Value-count Index, 1 2011. US Patent 7,873,629 B1
EbaidAElmagarmidAKIlyasIFOuzzaniMQuiané-RuizJTangNYinSNADEEF: a generalized data cleaning systemPVLDB201361212181221
Knuth, D. E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud, pp. 10–10 (2010)
GaoDJensenCSSnodgrassRTSooMDJoin operations in temporal databasesVLDB J.200514122910.1007/s00778-003-0111-3
Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J., Tang, N., Yin, S.: NADEEF/ER: generic and interactive entity resolution. In: SIGMOD, pp. 1071–1074 (2014)
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394 (2015)
Kiukkonen, N., Blom, J., Dousse, O., Gatica-Perez, D., Laurila, J.: Towards rich mobile phone datasets: lausanne data collection campaign. In: ICPS (2010)
Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: VLDB, pp. 30–41 (1999)
Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Quiané-Ruiz, J.-A., Tang, N., Yin, S.: BigDansing: a system for big data cleansing. In: SIGMOD, pp. 1215–1230 (2015)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT, pp. 99–110 (2010)
ZhangXChenLWangMEfficient multi-way theta-join processing using MapReducePVLDB201251111841195
Lohman, G., Mohan, C., Haas, L., Daniels, D., Lindsay, B., Selinger, P., Wilms, P.: Query processing in R*. In: Query Processing in Database Systems, pp. 31–47 (1985)
Lopes Siqueira, T.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: A spatial bitmap-based index for geographical data warehouses. In: SAC, pp. 1336–1342 (2009)
Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011)
Enderle, J., Hampel, M., Seidl, T.: Joining interval data in relational databases. In: SIGMOD, pp. 683–694 (2004)
Stockinger, K., Wu, K.: Bitmap indices for data warehouses. Data Wareh OLAP Concepts Archit Solut 5, 157–178 (2007)
Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: SIGMOD, pp. 215–226 (1999)
DittrichJQuiané-RuizJJindalAKarginYSettyVSchadJHadoop++: making a yellow elephant run like a cheetah (without it even noticing)PVLDB201031515529
441_CR9
441_CR20
441_CR7
441_CR22
441_CR8
441_CR21
441_CR5
441_CR24
441_CR6
441_CR23
441_CR3
A Ebaid (441_CR15) 2013; 6
441_CR4
441_CR2
D Gao (441_CR18) 2005; 14
Z Khayyat (441_CR25) 2015; 8
S Abiteboul (441_CR1) 1995
J Dean (441_CR12) 2008; 51
441_CR37
441_CR36
441_CR17
441_CR16
441_CR19
441_CR30
441_CR11
441_CR33
441_CR10
441_CR32
441_CR13
441_CR35
N Mamoulis (441_CR31) 2001; 26
441_CR34
441_CR26
441_CR28
J Dittrich (441_CR14) 2010; 3
441_CR27
441_CR29
X Zhang (441_CR38) 2012; 5
References_xml – reference: Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems. Pearson Education (2009)
– reference: GaoDJensenCSSnodgrassRTSooMDJoin operations in temporal databasesVLDB J.200514122910.1007/s00778-003-0111-3
– reference: Laurila, J.K., Gatica-Perez, D., Aad, I., Bornet, O., Do, T.-M.-T., Dousse, O., Eberle, J., Miettinen, M.: The mobile data challenge: big data for mobile computing research. In: Pervasive Computing (2012)
– reference: Enderle, J., Hampel, M., Seidl, T.: Joining interval data in relational databases. In: SIGMOD, pp. 683–694 (2004)
– reference: Okcan, A., Riedewald, M.: Processing theta-joins using MapReduce. In: SIGMOD, pp. 949–960 (2011)
– reference: Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud, pp. 10–10 (2010)
– reference: Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: VLDB, pp. 562–573 (1995)
– reference: Dallachiesa, M., Ebaid, A., Eldawy, A., Elmagarmid, A., Ilyas, I.F. Ouzzani, M., Tang, N.: NADEEF: a commodity data cleaning system. In: SIGMOD (2013)
– reference: EbaidAElmagarmidAKIlyasIFOuzzaniMQuiané-RuizJTangNYinSNADEEF: a generalized data cleaning systemPVLDB201361212181221
– reference: DeanJGhemawatSMapReduce: Simplified data processing on large clustersCommun. ACM200851110711310.1145/1327452.1327492
– reference: Lopes Siqueira, T.L., Ciferri, R.R., Times, V.C., de Aguiar Ciferri, C.D.: A spatial bitmap-based index for geographical data warehouses. In: SAC, pp. 1336–1342 (2009)
– reference: Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: EDBT, pp. 99–110 (2010)
– reference: Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)
– reference: Morris, J., Ramesh, B.: Dynamic Partition Enhanced Inequality Joining Using a Value-count Index, 1 2011. US Patent 7,873,629 B1
– reference: Agrawal, D., Chawla, S., Elmagarmid, A.K., Ouzzani, Z.K.M., Papotti, P., Quiané-Ruiz, J., Tang, N., Zaki, M.J.: Road to freedom in big data analytics. In: EDBT, pp. 479–484 (2016)
– reference: Elmagarmid, A.K., Ilyas, I.F., Ouzzani, M., Quiané-Ruiz, J., Tang, N., Yin, S.: NADEEF/ER: generic and interactive entity resolution. In: SIGMOD, pp. 1071–1074 (2014)
– reference: Stockinger, K., Wu, K.: Bitmap indices for data warehouses. Data Wareh OLAP Concepts Archit Solut 5, 157–178 (2007)
– reference: Bender, M.A., Hu, H.: An adaptive packed-memory array. TODS 32(4) 26:1–26:43 (2007)
– reference: Knuth, D. E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973)
– reference: Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: SIGMOD, pp. 325–336 (2006)
– reference: KhayyatZLuciaWSinghMOuzzaniMPapottiPQuiané-RuizJ-ATangNKalnisPLightning fast and space efficient inequality joinsPVLDB201581320742085
– reference: DittrichJQuiané-RuizJJindalAKarginYSettyVSchadJHadoop++: making a yellow elephant run like a cheetah (without it even noticing)PVLDB201031515529
– reference: AbiteboulSHullRVianuVFoundations of Databases1995ReadingAddison-Wesley0848.68031
– reference: Khayyat, Z., Ilyas, I.F., Jindal, A., Madden, S., Ouzzani, M., Papotti, P., Quiané-Ruiz, J.-A., Tang, N., Yin, S.: BigDansing: a system for big data cleansing. In: SIGMOD, pp. 1215–1230 (2015)
– reference: Schneider, D.A., DeWitt, D.J.: A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment. In: SIGMOD (1989)
– reference: Böhm, C., Klump, G., Kriegel, H.-P.: XZ-Ordering: A space-filling curve for objects with spatial extension. In: SSD, pp. 75–90 (1999)
– reference: DeWitt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. In: VLDB, pp. 443–452 (1991)
– reference: Chan, C.-Y., Ioannidis, Y. E.: Bitmap index design and evaluation. In: SIGMOD, pp. 355–366 (1998)
– reference: Chan, C.-Y., Ioannidis, Y.E.: An efficient bitmap encoding scheme for selection queries. In: SIGMOD, pp. 215–226 (1999)
– reference: Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE, pp. 746–755 (2007)
– reference: Kiukkonen, N., Blom, J., Dousse, O., Gatica-Perez, D., Laurila, J.: Towards rich mobile phone datasets: lausanne data collection campaign. In: ICPS (2010)
– reference: Lohman, G., Mohan, C., Haas, L., Daniels, D., Lindsay, B., Selinger, P., Wilms, P.: Query processing in R*. In: Query Processing in Database Systems, pp. 31–47 (1985)
– reference: ZhangXChenLWangMEfficient multi-way theta-join processing using MapReducePVLDB201251111841195
– reference: Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394 (2015)
– reference: MamoulisNPapadiasDMultiway spatial joinsTODS200126442447510.1145/503099.5031011136.68388
– reference: Kemper, A., Kossmann, D., Wiesner, C.: Generalised hash teams for join and group-by. In: VLDB, pp. 30–41 (1999)
– reference: Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD, pp. 47–57 (1984)
– reference: Chu, X., Ilyas, I.F., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013)
– volume: 5
  start-page: 1184
  issue: 11
  year: 2012
  ident: 441_CR38
  publication-title: PVLDB
– ident: 441_CR37
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: 441_CR12
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
– volume: 6
  start-page: 1218
  issue: 12
  year: 2013
  ident: 441_CR15
  publication-title: PVLDB
– ident: 441_CR24
  doi: 10.1145/2723372.2747646
– volume: 3
  start-page: 515
  issue: 1
  year: 2010
  ident: 441_CR14
  publication-title: PVLDB
– ident: 441_CR7
  doi: 10.1007/3-540-48482-5_7
– ident: 441_CR22
– ident: 441_CR36
  doi: 10.4018/987-1-59904-364-7.ch007
– ident: 441_CR2
  doi: 10.1145/1739041.1739056
– ident: 441_CR6
  doi: 10.1109/ICDE.2007.367920
– volume: 8
  start-page: 2074
  issue: 13
  year: 2015
  ident: 441_CR25
  publication-title: PVLDB
– volume: 14
  start-page: 2
  issue: 1
  year: 2005
  ident: 441_CR18
  publication-title: VLDB J.
  doi: 10.1007/s00778-003-0111-3
– ident: 441_CR34
  doi: 10.1145/67544.66937
– ident: 441_CR26
– ident: 441_CR9
  doi: 10.1145/304182.304201
– ident: 441_CR8
  doi: 10.1145/276304.276336
– ident: 441_CR10
– ident: 441_CR11
  doi: 10.1145/2463676.2465327
– ident: 441_CR16
  doi: 10.1145/2588555.2594511
– ident: 441_CR5
  doi: 10.1145/1292609.1292616
– ident: 441_CR13
– ident: 441_CR30
  doi: 10.1145/1529282.1529582
– ident: 441_CR17
  doi: 10.1145/1007568.1007645
– ident: 441_CR19
– ident: 441_CR21
  doi: 10.1145/602259.602266
– volume: 26
  start-page: 424
  issue: 4
  year: 2001
  ident: 441_CR31
  publication-title: TODS
  doi: 10.1145/503099.503101
– ident: 441_CR4
  doi: 10.1145/2723372.2742797
– ident: 441_CR20
  doi: 10.1145/1142473.1142511
– ident: 441_CR3
– ident: 441_CR28
  doi: 10.1016/j.pmcj.2013.10.001
– ident: 441_CR33
  doi: 10.1145/1989323.1989423
– ident: 441_CR27
– ident: 441_CR23
– ident: 441_CR32
– ident: 441_CR35
  doi: 10.1145/582095.582099
– ident: 441_CR29
  doi: 10.1007/978-3-642-82375-6_2
– volume-title: Foundations of Databases
  year: 1995
  ident: 441_CR1
SSID ssj0002225
Score 2.2516298
Snippet Inequality joins, which is to join relations with inequality conditions, are used in various applications. Optimizing joins has been the subject of intensive...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 125
SubjectTerms Algorithms
Arrays
Computer Science
Database Management
Optimization
Optimization techniques
Queries
Selectivity
Special Issue Paper
Title Fast and scalable inequality joins
URI https://link.springer.com/article/10.1007/s00778-016-0441-6
https://www.proquest.com/docview/1880771998
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB5se_FifWK1liCelC3JJrtJj600FsWeWqinsJtswAepmPSgv97ZvNSiQs_ZLMnM7sz3MS-ACwTJlhKmJINI2cRhShEhmEditIWRxb2Y5lNL7qd8MnduF2xR1nGnVbZ7FZLMLXVd7KY7z-jEK2TA6MMJb0CLWd7Aa0JrePNwN64NsKYweZCTc4IEz6uCmb9t8tMdfWHMtbBo7m38Nsyq7yySTJ77q0z2w4-1Fo4b_sgu7JTo0xgWx2UPtlSyD-1qsoNRXvQDOPdFmhkiiYwUdairqwxEo0UB5rvxtHxM0kOY--PZ9YSU0xRIaFs8Iw4CBYHky4ylRNJDXckYurDIlTyMbIeadmhRZbpS0ohKjuwSoZ9iwlWWx9E820fQTJaJOgaDmTENGeo4joXj2p5wqIqZRo90oAlTB8xKqEFYthrXEy9egrpJci6DQKeXaRkEvAOX9SuvRZ-N_xZ3K00F5ZVLA91YznV1yWAHrirBf3v812YnG60-hW2qHXuet92FZva2UmcISzLZw2Poj0bTXnkce9CY0-EnNvjWog
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7ofNAX7-J0ahCflECbNmn3OMQxddvTBnsLSZuCIp3Y-uC_9yS9eEEFn5sGetKc7_s4N4ALJMm-UZ6m_dQENOTGUKV4TDP0hakv4oy5qSWTqRjNw7sFX9R13EWT7d6EJJ2nbovdbOcZm3iFChgxnIpVWEMuENuxBXM2aN2vFTAuxCkERXkXN6HMn7b4CkYfDPNbUNRhzXAbNmuSSAbVqe7Aisl3YasZwEDq-7gH50NVlETlKSnQ1LYIiiBprOok38jj8iEv9mE-vJldj2g99IAmgS9KGiKeK9RIXqY1ahMWac4RadJIiyQNQuYFic-MF2nNUqYFikBkaIaryPixQC8aHEAnX-bmEAj3MpZwPIosU2EUxCpkJuOW5LG-1TVd8Jqvl0ndEdwOpniSbS9jZzBps8CswaTowmX7ynPVDuOvxb3GpLK-GYW0_d-iyFb2deGqMfOnx79tdvSv1WewPppNxnJ8O70_hg1msdilWvegU768mhNkEqU-dX_OO_tVulc
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS8QwFH7oCOLFXRwdNYgnJUybNmnnOKhl3AYPDsytJG0CinQGWw_-e1-6uaCC56aBvibv-z7eBnCCJNnV0lF0kGqP-lxrKiUPqUFfmLoiNKycWnI3FqOJfz3l03rOad5kuzchyaqmwXZpyor-PDX9tvDNdqGxSViohhHPqViEJfTGrj3oEzZsXbEVM2W4UwiKUi9swpo_bfEVmD7Y5rcAaYk70Tqs1oSRDKs_vAELOtuEtWYYA6nv5hYcRzIviMxSkqPZbUEUQQJZ1Uy-kafZY5ZvwyS6fDgf0XoAAk08VxTUR2yXqJccoxTqFBYozhF10kCJJPV85niJy7QTKMVSpgQKQmRrmstAu6FAj-rtQCebZXoXCHcMSzj-FmOkH3ih9Jk23BI-NrAapwtO8_VxUncHt0MqnuO2r3FpsNhmhFmDxaILp-0r86o1xl-Le41J4_qW5LHtBRcEtsqvC2eNmT89_m2zvX-tPoLl-4sovr0a3-zDCrOwXGZd96BTvLzqAyQVhTosD847U4u-kw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fast+and+scalable+inequality+joins&rft.jtitle=The+VLDB+journal&rft.au=Khayyat%2C+Zuhair&rft.au=Lucia%2C+William&rft.au=Singh%2C+Meghna&rft.au=Ouzzani%2C+Mourad&rft.date=2017-02-01&rft.pub=Springer+Berlin+Heidelberg&rft.issn=1066-8888&rft.eissn=0949-877X&rft.volume=26&rft.issue=1&rft.spage=125&rft.epage=150&rft_id=info:doi/10.1007%2Fs00778-016-0441-6&rft.externalDocID=10_1007_s00778_016_0441_6
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon