Off-line compression by greedy textual substitution

Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; th...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the IEEE Vol. 88; no. 11; pp. 1733 - 1744
Main Authors Apostolico, A., Lonardi, S.
Format Journal Article Conference Proceeding
LanguageEnglish
Published New York, NY IEEE 01.11.2000
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
AbstractList Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments
Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments.
Author Lonardi, S.
Apostolico, A.
Author_xml – sequence: 1
  givenname: A.
  surname: Apostolico
  fullname: Apostolico, A.
  email: axa@cs.purdue.edu
  organization: Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
– sequence: 2
  givenname: S.
  surname: Lonardi
  fullname: Lonardi, S.
  email: stelo@cs.purdue.edu
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=875273$$DView record in Pascal Francis
BookMark eNqFkU1Lw0AQhhepYFsFz56CXrykzn5vjlL8gkIveg6bZFZS0qTuJmD_vSspKl562YGdh4d5Z2Zk0nYtEnJJYUEpZHdyYTKmITshUyqlSRmTakKmANSkGaPZGZmFsAEALhWfEr52Lm3qFpOy2-48hlB3bVLsk3ePWO2THj_7wTZJGIrQ1_3Qx_Y5OXW2CXhxqHPy9vjwunxOV-unl-X9Ki0FZ318RZFV1FWopFFohRZCWFdB_HdCSelEqTmXVaGB8cJQ1I6zqqIGDDor-Zzcjt6d7z4GDH2-rUOJTWNb7IaQU61jChajHUeVppwZCeo4CoxlIBiHiF7_Qzfd4NuYOTfRRZUQf6YsfReCR5fvfL21fh9N-fdJcpmPJ4nozcFnQ2kb521b1uGHN1qyuJA5uRqpGhF_m6PiC4cCkcg
CODEN IEEPAD
CitedBy_id crossref_primary_10_1109_ACCESS_2020_3013676
crossref_primary_10_3390_a13040103
crossref_primary_10_1142_S0129054109007029
crossref_primary_10_1016_j_ipl_2014_08_014
crossref_primary_10_1186_1471_2105_11_514
crossref_primary_10_5808_GI_2011_9_1_005
crossref_primary_10_1016_j_ipm_2011_01_006
crossref_primary_10_3390_a4040262
crossref_primary_10_1002_asi_20515
crossref_primary_10_1007_s11786_010_0033_6
crossref_primary_10_1016_j_jda_2011_04_006
crossref_primary_10_1007_s00224_017_9839_9
crossref_primary_10_1080_17459737_2021_2002956
crossref_primary_10_1016_j_ic_2022_104999
crossref_primary_10_1109_TIT_2018_2871452
crossref_primary_10_1007_s41870_020_00472_2
crossref_primary_10_1109_ACCESS_2022_3141781
crossref_primary_10_1002_spe_619
crossref_primary_10_1007_s00224_020_10013_w
crossref_primary_10_1016_j_jda_2012_07_009
crossref_primary_10_1109_5_892708
crossref_primary_10_3390_a5020214
crossref_primary_10_1186_1748_7188_1_4
crossref_primary_10_1016_j_endm_2005_07_029
crossref_primary_10_1109_TIT_2005_850116
crossref_primary_10_3390_a14020065
crossref_primary_10_3390_a2041429
crossref_primary_10_1145_568727_568730
crossref_primary_10_1137_130936889
Cites_doi 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
10.1109/TIT.1977.1055714
10.1109/TIT.1976.1055501
10.1007/BF02458580
10.1145/63334.63341
10.1109/TSMC.1975.5409159
10.1145/359460.359480
10.1109/DCC.1999.755675
10.1145/322344.322346
10.1145/360363.360368
10.1016/0196-6774(92)90049-I
10.1109/TIT.1978.1055934
10.1006/jcta.1997.2843
10.1109/DCC.1999.755679
10.1109/DCC.1997.581998
10.1109/ICCAS.2007.4406805
10.1109/TIT.1975.1055349
10.1007/BF00993061
10.1109/TIT.1987.1057284
10.1109/DCC.1995.515520
10.1016/0306-4573(94)90014-0
10.1007/BF01955046
10.1007/BF01206331
10.1093/bioinformatics/13.2.131
10.1109/DCC.1994.305932
10.1109/DCC.1993.253115
10.1109/IPPS.1994.288279
10.1016/0020-0190(96)00068-3
10.1016/0020-0190(91)90223-5
10.1002/spe.4380240703
10.1109/DCC.1999.755678
10.1109/DCC.1996.488324
10.1109/DCC.1996.488385
ContentType Journal Article
Conference Proceeding
Copyright 2001 INIST-CNRS
Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2000
Copyright_xml – notice: 2001 INIST-CNRS
– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2000
DBID RIA
RIE
IQODW
AAYXX
CITATION
7SP
8FD
L7M
F28
FR3
DOI 10.1109/5.892709
DatabaseName IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE/IET Electronic Library (IEL)
Pascal-Francis
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
Engineering Research Database
ANTE: Abstracts in New Technology & Engineering
DatabaseTitleList Engineering Research Database

Engineering Research Database
Engineering Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Statistics
Applied Sciences
Physics
EISSN 1558-2256
EndPage 1744
ExternalDocumentID 2434117151
10_1109_5_892709
875273
892709
Genre orig-research
GrantInformation_xml – fundername: Institute of Electrical and Electronics Engineers
GroupedDBID -DZ
-~X
.DC
0R~
123
1OL
29P
3EH
4.4
6IK
85S
9M8
AAJGR
AAYOK
ABFSI
ABJNI
ABQJQ
ABTAH
ABVLG
ACBEA
ACGFS
AENEX
AETEA
AETIX
AFDAS
AFMIJ
AFOGA
AGNAY
AIBXA
ALLEH
ALMA_UNASSIGNED_HOLDINGS
AZLTO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F20
FA8
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MVM
O9-
OCL
RIA
RIE
RIG
RIU
RNS
TAE
TN5
TWZ
UDY
UHB
UKR
UQL
VOH
WHG
XFK
XJT
XOL
YNT
ZCA
ZXP
ZY4
~02
IQODW
AAYXX
CITATION
7SP
8FD
L7M
F28
FR3
ID FETCH-LOGICAL-c432t-c44b9d1fde6586ea47444afd044bf4655f4c7335db7023b81e7f32dd1808efa53
IEDL.DBID RIE
ISSN 0018-9219
IngestDate Sat Aug 17 04:27:24 EDT 2024
Fri Aug 16 14:12:57 EDT 2024
Fri Aug 16 09:50:57 EDT 2024
Thu Oct 10 15:39:49 EDT 2024
Thu Sep 26 17:59:51 EDT 2024
Sun Oct 29 17:06:48 EDT 2023
Wed Jun 26 19:27:22 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords Textual data
Substitution
Texturation
Data compression
Greedy algorithm
Theoretical study
Data structures
Sequencing
Grammatical inference
Biological system
Language English
License CC BY 4.0
LinkModel DirectLink
MeetingName Lossless Data Compression
MergedId FETCHMERGED-LOGICAL-c432t-c44b9d1fde6586ea47444afd044bf4655f4c7335db7023b81e7f32dd1808efa53
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
OpenAccessLink https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2472&context=cstech
PQID 885016445
PQPubID 23500
PageCount 12
ParticipantIDs pascalfrancis_primary_875273
proquest_miscellaneous_1022904230
crossref_primary_10_1109_5_892709
proquest_miscellaneous_1671328506
proquest_miscellaneous_1770352001
proquest_journals_885016445
ieee_primary_892709
PublicationCentury 2000
PublicationDate 2000-11-01
PublicationDateYYYYMMDD 2000-11-01
PublicationDate_xml – month: 11
  year: 2000
  text: 2000-11-01
  day: 01
PublicationDecade 2000
PublicationPlace New York, NY
PublicationPlace_xml – name: New York, NY
– name: New York
PublicationTitle Proceedings of the IEEE
PublicationTitleAbbrev JPROC
PublicationYear 2000
Publisher IEEE
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref13
ref34
ref12
ref37
ref15
ref36
ref14
storer (ref2) 1988
ref31
ref30
ref33
ref11
ref32
ref10
loewenstern (ref29) 1995
allison (ref23) 1990; 52
even (ref5) 1978; 21
ref39
ref17
ref38
ref19
ref18
gatlin (ref22) 1972
lonardi (ref44) 1999
bell (ref1) 1990
ref26
ref25
ref20
ref42
ref21
loewenstern (ref28) 1998
ref27
burrows (ref41) 1994
apostolico (ref16) 1983
ref8
ref7
ref9
ref4
ref3
ref6
farach (ref24) 1995
ref40
allison (ref43) 1998
References_xml – ident: ref37
  doi: 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
– ident: ref8
  doi: 10.1109/TIT.1977.1055714
– ident: ref7
  doi: 10.1109/TIT.1976.1055501
– volume: 52
  start-page: 431
  year: 1990
  ident: ref23
  article-title: minimum message length encoding and the comparison of macro-molecules
  publication-title: Bull Math Biol
  doi: 10.1007/BF02458580
  contributor:
    fullname: allison
– start-page: 48
  year: 1995
  ident: ref24
  article-title: on the entropy of dna: algorithms and measurements based on memory and rapid convergence
  publication-title: ACM-SIAM Annu Symp Discrete Algorithms
  contributor:
    fullname: farach
– ident: ref17
  doi: 10.1145/63334.63341
– ident: ref14
  doi: 10.1109/TSMC.1975.5409159
– volume: 21
  start-page: 315
  year: 1978
  ident: ref5
  article-title: economical encoding of commas between strings
  publication-title: Communications of the ACM
  doi: 10.1145/359460.359480
  contributor:
    fullname: even
– ident: ref42
  doi: 10.1109/DCC.1999.755675
– year: 1972
  ident: ref22
  publication-title: Information Theory and the Living Systems
  contributor:
    fullname: gatlin
– ident: ref3
  doi: 10.1145/322344.322346
– start-page: 70
  year: 1983
  ident: ref16
  article-title: linear time universal compression techniques based on pattern matching
  publication-title: Proc 21st Allerton Conf Communication Control and Computing
  contributor:
    fullname: apostolico
– ident: ref19
  doi: 10.1145/360363.360368
– ident: ref33
  doi: 10.1016/0196-6774(92)90049-I
– ident: ref9
  doi: 10.1109/TIT.1978.1055934
– ident: ref36
  doi: 10.1006/jcta.1997.2843
– ident: ref20
  doi: 10.1109/DCC.1999.755679
– ident: ref27
  doi: 10.1109/DCC.1997.581998
– ident: ref39
  doi: 10.1109/ICCAS.2007.4406805
– ident: ref6
  doi: 10.1109/TIT.1975.1055349
– year: 1999
  ident: ref44
  publication-title: Off-line data compression by textual substitution
  contributor:
    fullname: lonardi
– ident: ref30
  doi: 10.1007/BF00993061
– ident: ref4
  doi: 10.1109/TIT.1987.1057284
– ident: ref18
  doi: 10.1109/DCC.1995.515520
– ident: ref26
  doi: 10.1016/0306-4573(94)90014-0
– ident: ref35
  doi: 10.1007/BF01955046
– ident: ref34
  doi: 10.1007/BF01206331
– year: 1998
  ident: ref28
  article-title: maximum a posteriori classification of dna structure from sequence information
  publication-title: Pacific Symp Biotech
  contributor:
    fullname: loewenstern
– year: 1995
  ident: ref29
  publication-title: DNA sequence classification using compression-based induction
  contributor:
    fullname: loewenstern
– year: 1990
  ident: ref1
  publication-title: Text Compression
  contributor:
    fullname: bell
– ident: ref32
  doi: 10.1093/bioinformatics/13.2.131
– ident: ref15
  doi: 10.1109/DCC.1994.305932
– ident: ref13
  doi: 10.1109/TSMC.1975.5409159
– ident: ref25
  doi: 10.1109/DCC.1993.253115
– ident: ref11
  doi: 10.1109/IPPS.1994.288279
– ident: ref12
  doi: 10.1016/0020-0190(96)00068-3
– ident: ref10
  doi: 10.1016/0020-0190(91)90223-5
– year: 1994
  ident: ref41
  publication-title: A block-sorting lossless data compression algorithm
  contributor:
    fullname: burrows
– ident: ref40
  doi: 10.1002/spe.4380240703
– year: 1988
  ident: ref2
  publication-title: Data Compression Methods and Theory
  contributor:
    fullname: storer
– start-page: 8
  year: 1998
  ident: ref43
  article-title: compression of strings with approximate repeats
  publication-title: Intell Syst Mol Biol 98
  contributor:
    fullname: allison
– ident: ref21
  doi: 10.1109/DCC.1999.755678
– ident: ref38
  doi: 10.1109/DCC.1996.488324
– ident: ref31
  doi: 10.1109/DCC.1996.488385
SSID ssj0003563
Score 1.9747494
Snippet Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is...
SourceID proquest
crossref
pascalfrancis
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 1733
SubjectTerms Applied sciences
Biological information theory
Biology computing
CD-ROMs
Coding, codes
Compressing
Computation
Computational efficiency
Computers in experimental physics
Councils
Data compression
Data presentation and visualization: algorithms and implementation
Dictionaries
Encoding
Exact sciences and technology
Inference
Information, signal and communications theory
Instruments, apparatus, components and techniques common to several branches of physics and astronomy
Physics
Production
Signal and communications theory
Statistics
Strings
Telecommunications and information theory
Texts
Title Off-line compression by greedy textual substitution
URI https://ieeexplore.ieee.org/document/892709
https://www.proquest.com/docview/885016445
https://search.proquest.com/docview/1022904230
https://search.proquest.com/docview/1671328506
https://search.proquest.com/docview/1770352001
Volume 88
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDI5gp3HgMUCMDVQkru3SNm3SI0JMExJwYdJuVdokF6QO0fYAvx476QYDNHGpqjhtIyd27Mb-TMh1iiBmyqS-kFL4TIENJ6gufTz0gt03NDLBBOeHx3Q2Z_eLZNHhbNtcGK21DT7TAd7as3y1LFv8VTYRWcQxWW9X0Milaq2Vbpx0RdNCkF-Qwg5nNqTZJAnccxs7jy2lgoGQsgZeGFfE4pc-tpvM9MBlb9cWmxBjS16CtimC8uMHcuM_x39I9jtj07txq-OI7OhqQPa-QRAOSB-tTQfWfEziJ2N8tDs9jDR3EbKVV7x74JSDNvYwSqSFF9agbWyIAZBPyHx693w787uqCn7J4qiBKysyFRqlwfhItWScMSaNotBuEE3NsJLHcaIKDvt5IULNTRwpFQoqNExdfEp61bLSZ8QTQgrKZBkJYZgKy0IpRgV4XKUGS8okQ3K14nj-6sAzcut00CxPcseNIRkgo9b0Vet4Y2a-yBwR44ZktJqovJO5OhciQbwwhp9dU0FY8AREVnrZ1jm6txlGAtEtfVLw2yME8tvSh3PEkYXVdv7n8Eekb1P2bc7imPSat1ZfgPHSFJd22X4CZdPtEg
link.rule.ids 310,311,315,786,790,795,796,802,23958,23959,25170,27955,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDI7QOAAHHgPEGI8ice2WtmmTHhFiGo-NyyZxq9ImuSB1iK4H-PXYSTeemrhUVZy-nDi2G_szIZcJgpgpk_hCSuEzBTacoLrwcdMLtG9gZIwJzqNxMpyyu6f4qcHZtrkwWmsbfKZ7eGr38tWsqPFXWV-kIcdkvXVQ85S7ZK3lshvFTdm0ACQY5LBBmg1o2o977spvuscWU8FQSFkBN4wrY_FrRbZqZrDj8rcri06I0SXPvXqe94r3H9iN__yCXbLdmJvelZsfe2RNl22y9QWEsE020d50cM37JHo0xkfL08NYcxcjW3r5mwduOazHHsaJ1HDDCtYbG2QA5AMyHdxMrod-U1fBL1gUzuHI8lQFRmkwPxItGWeMSaMotBvEUzOs4FEUq5yDRs9FoLmJQqUCQYWGwYsOSauclfqIeEJIQZksQiEMU0GRK8WoAJ-r0GBLmbhDLhYcz14cfEZm3Q6aZnHmuNEhbWTUkr5oPfk2Mp9kjphxHdJdDFTWSF2VCREjYhjDxy6pIC64ByJLPaurDB3cFGOB6Io-CXjuIUL5rejDOSLJwmw7_vP1z8nGcDJ6yB5ux_ddsmkT-G0G4wlpzV9rfQqmzDw_s1P4AwSn8GY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+IEEE&rft.atitle=Off-line+compression+by+greedy+textual+substitution&rft.au=APOSTOLICO%2C+Alberto+SR&rft.au=LONARDI%2C+Stefano&rft.date=2000-11-01&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=0018-9219&rft.eissn=1558-2256&rft.volume=88&rft.issue=11&rft.spage=1733&rft.epage=1744&rft_id=info:doi/10.1109%2F5.892709&rft.externalDBID=n%2Fa&rft.externalDocID=875273
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9219&client=summon