Off-line compression by greedy textual substitution
Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; th...
Saved in:
Published in | Proceedings of the IEEE Vol. 88; no. 11; pp. 1733 - 1744 |
---|---|
Main Authors | , |
Format | Journal Article Conference Proceeding |
Language | English |
Published |
New York, NY
IEEE
01.11.2000
Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments. |
---|---|
AbstractList | Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is identified such that replacing all instances of W in X except one by a suitable pair of pointers yields the highest possible contraction of X; the process is then repeated on the contracted text string until substrings capable of producing contractions can no longer be found. This paper examines computational issues arising in the implementation of this paradigm and describes some applications and experiments. |
Author | Lonardi, S. Apostolico, A. |
Author_xml | – sequence: 1 givenname: A. surname: Apostolico fullname: Apostolico, A. email: axa@cs.purdue.edu organization: Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA – sequence: 2 givenname: S. surname: Lonardi fullname: Lonardi, S. email: stelo@cs.purdue.edu |
BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=875273$$DView record in Pascal Francis |
BookMark | eNqFkU1Lw0AQhhepYFsFz56CXrykzn5vjlL8gkIveg6bZFZS0qTuJmD_vSspKl562YGdh4d5Z2Zk0nYtEnJJYUEpZHdyYTKmITshUyqlSRmTakKmANSkGaPZGZmFsAEALhWfEr52Lm3qFpOy2-48hlB3bVLsk3ePWO2THj_7wTZJGIrQ1_3Qx_Y5OXW2CXhxqHPy9vjwunxOV-unl-X9Ki0FZ318RZFV1FWopFFohRZCWFdB_HdCSelEqTmXVaGB8cJQ1I6zqqIGDDor-Zzcjt6d7z4GDH2-rUOJTWNb7IaQU61jChajHUeVppwZCeo4CoxlIBiHiF7_Qzfd4NuYOTfRRZUQf6YsfReCR5fvfL21fh9N-fdJcpmPJ4nozcFnQ2kb521b1uGHN1qyuJA5uRqpGhF_m6PiC4cCkcg |
CODEN | IEEPAD |
CitedBy_id | crossref_primary_10_1109_ACCESS_2020_3013676 crossref_primary_10_3390_a13040103 crossref_primary_10_1142_S0129054109007029 crossref_primary_10_1016_j_ipl_2014_08_014 crossref_primary_10_1186_1471_2105_11_514 crossref_primary_10_5808_GI_2011_9_1_005 crossref_primary_10_1016_j_ipm_2011_01_006 crossref_primary_10_3390_a4040262 crossref_primary_10_1002_asi_20515 crossref_primary_10_1007_s11786_010_0033_6 crossref_primary_10_1016_j_jda_2011_04_006 crossref_primary_10_1007_s00224_017_9839_9 crossref_primary_10_1080_17459737_2021_2002956 crossref_primary_10_1016_j_ic_2022_104999 crossref_primary_10_1109_TIT_2018_2871452 crossref_primary_10_1007_s41870_020_00472_2 crossref_primary_10_1109_ACCESS_2022_3141781 crossref_primary_10_1002_spe_619 crossref_primary_10_1007_s00224_020_10013_w crossref_primary_10_1016_j_jda_2012_07_009 crossref_primary_10_1109_5_892708 crossref_primary_10_3390_a5020214 crossref_primary_10_1186_1748_7188_1_4 crossref_primary_10_1016_j_endm_2005_07_029 crossref_primary_10_1109_TIT_2005_850116 crossref_primary_10_3390_a14020065 crossref_primary_10_3390_a2041429 crossref_primary_10_1145_568727_568730 crossref_primary_10_1137_130936889 |
Cites_doi | 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O 10.1109/TIT.1977.1055714 10.1109/TIT.1976.1055501 10.1007/BF02458580 10.1145/63334.63341 10.1109/TSMC.1975.5409159 10.1145/359460.359480 10.1109/DCC.1999.755675 10.1145/322344.322346 10.1145/360363.360368 10.1016/0196-6774(92)90049-I 10.1109/TIT.1978.1055934 10.1006/jcta.1997.2843 10.1109/DCC.1999.755679 10.1109/DCC.1997.581998 10.1109/ICCAS.2007.4406805 10.1109/TIT.1975.1055349 10.1007/BF00993061 10.1109/TIT.1987.1057284 10.1109/DCC.1995.515520 10.1016/0306-4573(94)90014-0 10.1007/BF01955046 10.1007/BF01206331 10.1093/bioinformatics/13.2.131 10.1109/DCC.1994.305932 10.1109/DCC.1993.253115 10.1109/IPPS.1994.288279 10.1016/0020-0190(96)00068-3 10.1016/0020-0190(91)90223-5 10.1002/spe.4380240703 10.1109/DCC.1999.755678 10.1109/DCC.1996.488324 10.1109/DCC.1996.488385 |
ContentType | Journal Article Conference Proceeding |
Copyright | 2001 INIST-CNRS Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2000 |
Copyright_xml | – notice: 2001 INIST-CNRS – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2000 |
DBID | RIA RIE IQODW AAYXX CITATION 7SP 8FD L7M F28 FR3 |
DOI | 10.1109/5.892709 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE/IET Electronic Library (IEL) Pascal-Francis CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Engineering Research Database |
DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts Engineering Research Database ANTE: Abstracts in New Technology & Engineering |
DatabaseTitleList | Engineering Research Database Engineering Research Database Engineering Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Statistics Applied Sciences Physics |
EISSN | 1558-2256 |
EndPage | 1744 |
ExternalDocumentID | 2434117151 10_1109_5_892709 875273 892709 |
Genre | orig-research |
GrantInformation_xml | – fundername: Institute of Electrical and Electronics Engineers |
GroupedDBID | -DZ -~X .DC 0R~ 123 1OL 29P 3EH 4.4 6IK 85S 9M8 AAJGR AAYOK ABFSI ABJNI ABQJQ ABTAH ABVLG ACBEA ACGFS AENEX AETEA AETIX AFDAS AFMIJ AFOGA AGNAY AIBXA ALLEH ALMA_UNASSIGNED_HOLDINGS AZLTO BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F20 FA8 HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MVM O9- OCL RIA RIE RIG RIU RNS TAE TN5 TWZ UDY UHB UKR UQL VOH WHG XFK XJT XOL YNT ZCA ZXP ZY4 ~02 IQODW AAYXX CITATION 7SP 8FD L7M F28 FR3 |
ID | FETCH-LOGICAL-c432t-c44b9d1fde6586ea47444afd044bf4655f4c7335db7023b81e7f32dd1808efa53 |
IEDL.DBID | RIE |
ISSN | 0018-9219 |
IngestDate | Sat Aug 17 04:27:24 EDT 2024 Fri Aug 16 14:12:57 EDT 2024 Fri Aug 16 09:50:57 EDT 2024 Thu Oct 10 15:39:49 EDT 2024 Thu Sep 26 17:59:51 EDT 2024 Sun Oct 29 17:06:48 EDT 2023 Wed Jun 26 19:27:22 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Keywords | Textual data Substitution Texturation Data compression Greedy algorithm Theoretical study Data structures Sequencing Grammatical inference Biological system |
Language | English |
License | CC BY 4.0 |
LinkModel | DirectLink |
MeetingName | Lossless Data Compression |
MergedId | FETCHMERGED-LOGICAL-c432t-c44b9d1fde6586ea47444afd044bf4655f4c7335db7023b81e7f32dd1808efa53 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
OpenAccessLink | https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2472&context=cstech |
PQID | 885016445 |
PQPubID | 23500 |
PageCount | 12 |
ParticipantIDs | pascalfrancis_primary_875273 proquest_miscellaneous_1022904230 crossref_primary_10_1109_5_892709 proquest_miscellaneous_1671328506 proquest_miscellaneous_1770352001 proquest_journals_885016445 ieee_primary_892709 |
PublicationCentury | 2000 |
PublicationDate | 2000-11-01 |
PublicationDateYYYYMMDD | 2000-11-01 |
PublicationDate_xml | – month: 11 year: 2000 text: 2000-11-01 day: 01 |
PublicationDecade | 2000 |
PublicationPlace | New York, NY |
PublicationPlace_xml | – name: New York, NY – name: New York |
PublicationTitle | Proceedings of the IEEE |
PublicationTitleAbbrev | JPROC |
PublicationYear | 2000 |
Publisher | IEEE Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref35 ref13 ref34 ref12 ref37 ref15 ref36 ref14 storer (ref2) 1988 ref31 ref30 ref33 ref11 ref32 ref10 loewenstern (ref29) 1995 allison (ref23) 1990; 52 even (ref5) 1978; 21 ref39 ref17 ref38 ref19 ref18 gatlin (ref22) 1972 lonardi (ref44) 1999 bell (ref1) 1990 ref26 ref25 ref20 ref42 ref21 loewenstern (ref28) 1998 ref27 burrows (ref41) 1994 apostolico (ref16) 1983 ref8 ref7 ref9 ref4 ref3 ref6 farach (ref24) 1995 ref40 allison (ref43) 1998 |
References_xml | – ident: ref37 doi: 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O – ident: ref8 doi: 10.1109/TIT.1977.1055714 – ident: ref7 doi: 10.1109/TIT.1976.1055501 – volume: 52 start-page: 431 year: 1990 ident: ref23 article-title: minimum message length encoding and the comparison of macro-molecules publication-title: Bull Math Biol doi: 10.1007/BF02458580 contributor: fullname: allison – start-page: 48 year: 1995 ident: ref24 article-title: on the entropy of dna: algorithms and measurements based on memory and rapid convergence publication-title: ACM-SIAM Annu Symp Discrete Algorithms contributor: fullname: farach – ident: ref17 doi: 10.1145/63334.63341 – ident: ref14 doi: 10.1109/TSMC.1975.5409159 – volume: 21 start-page: 315 year: 1978 ident: ref5 article-title: economical encoding of commas between strings publication-title: Communications of the ACM doi: 10.1145/359460.359480 contributor: fullname: even – ident: ref42 doi: 10.1109/DCC.1999.755675 – year: 1972 ident: ref22 publication-title: Information Theory and the Living Systems contributor: fullname: gatlin – ident: ref3 doi: 10.1145/322344.322346 – start-page: 70 year: 1983 ident: ref16 article-title: linear time universal compression techniques based on pattern matching publication-title: Proc 21st Allerton Conf Communication Control and Computing contributor: fullname: apostolico – ident: ref19 doi: 10.1145/360363.360368 – ident: ref33 doi: 10.1016/0196-6774(92)90049-I – ident: ref9 doi: 10.1109/TIT.1978.1055934 – ident: ref36 doi: 10.1006/jcta.1997.2843 – ident: ref20 doi: 10.1109/DCC.1999.755679 – ident: ref27 doi: 10.1109/DCC.1997.581998 – ident: ref39 doi: 10.1109/ICCAS.2007.4406805 – ident: ref6 doi: 10.1109/TIT.1975.1055349 – year: 1999 ident: ref44 publication-title: Off-line data compression by textual substitution contributor: fullname: lonardi – ident: ref30 doi: 10.1007/BF00993061 – ident: ref4 doi: 10.1109/TIT.1987.1057284 – ident: ref18 doi: 10.1109/DCC.1995.515520 – ident: ref26 doi: 10.1016/0306-4573(94)90014-0 – ident: ref35 doi: 10.1007/BF01955046 – ident: ref34 doi: 10.1007/BF01206331 – year: 1998 ident: ref28 article-title: maximum a posteriori classification of dna structure from sequence information publication-title: Pacific Symp Biotech contributor: fullname: loewenstern – year: 1995 ident: ref29 publication-title: DNA sequence classification using compression-based induction contributor: fullname: loewenstern – year: 1990 ident: ref1 publication-title: Text Compression contributor: fullname: bell – ident: ref32 doi: 10.1093/bioinformatics/13.2.131 – ident: ref15 doi: 10.1109/DCC.1994.305932 – ident: ref13 doi: 10.1109/TSMC.1975.5409159 – ident: ref25 doi: 10.1109/DCC.1993.253115 – ident: ref11 doi: 10.1109/IPPS.1994.288279 – ident: ref12 doi: 10.1016/0020-0190(96)00068-3 – ident: ref10 doi: 10.1016/0020-0190(91)90223-5 – year: 1994 ident: ref41 publication-title: A block-sorting lossless data compression algorithm contributor: fullname: burrows – ident: ref40 doi: 10.1002/spe.4380240703 – year: 1988 ident: ref2 publication-title: Data Compression Methods and Theory contributor: fullname: storer – start-page: 8 year: 1998 ident: ref43 article-title: compression of strings with approximate repeats publication-title: Intell Syst Mol Biol 98 contributor: fullname: allison – ident: ref21 doi: 10.1109/DCC.1999.755678 – ident: ref38 doi: 10.1109/DCC.1996.488324 – ident: ref31 doi: 10.1109/DCC.1996.488385 |
SSID | ssj0003563 |
Score | 1.9747494 |
Snippet | Greedy off-line textual substitution refers to the following approach to compression or structural inference. Given a long text string x, a substring W is... |
SourceID | proquest crossref pascalfrancis ieee |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 1733 |
SubjectTerms | Applied sciences Biological information theory Biology computing CD-ROMs Coding, codes Compressing Computation Computational efficiency Computers in experimental physics Councils Data compression Data presentation and visualization: algorithms and implementation Dictionaries Encoding Exact sciences and technology Inference Information, signal and communications theory Instruments, apparatus, components and techniques common to several branches of physics and astronomy Physics Production Signal and communications theory Statistics Strings Telecommunications and information theory Texts |
Title | Off-line compression by greedy textual substitution |
URI | https://ieeexplore.ieee.org/document/892709 https://www.proquest.com/docview/885016445 https://search.proquest.com/docview/1022904230 https://search.proquest.com/docview/1671328506 https://search.proquest.com/docview/1770352001 |
Volume | 88 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDI5gp3HgMUCMDVQkru3SNm3SI0JMExJwYdJuVdokF6QO0fYAvx476QYDNHGpqjhtIyd27Mb-TMh1iiBmyqS-kFL4TIENJ6gufTz0gt03NDLBBOeHx3Q2Z_eLZNHhbNtcGK21DT7TAd7as3y1LFv8VTYRWcQxWW9X0Milaq2Vbpx0RdNCkF-Qwg5nNqTZJAnccxs7jy2lgoGQsgZeGFfE4pc-tpvM9MBlb9cWmxBjS16CtimC8uMHcuM_x39I9jtj07txq-OI7OhqQPa-QRAOSB-tTQfWfEziJ2N8tDs9jDR3EbKVV7x74JSDNvYwSqSFF9agbWyIAZBPyHx693w787uqCn7J4qiBKysyFRqlwfhItWScMSaNotBuEE3NsJLHcaIKDvt5IULNTRwpFQoqNExdfEp61bLSZ8QTQgrKZBkJYZgKy0IpRgV4XKUGS8okQ3K14nj-6sAzcut00CxPcseNIRkgo9b0Vet4Y2a-yBwR44ZktJqovJO5OhciQbwwhp9dU0FY8AREVnrZ1jm6txlGAtEtfVLw2yME8tvSh3PEkYXVdv7n8Eekb1P2bc7imPSat1ZfgPHSFJd22X4CZdPtEg |
link.rule.ids | 310,311,315,786,790,795,796,802,23958,23959,25170,27955,27956,55107 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDI7QOAAHHgPEGI8ice2WtmmTHhFiGo-NyyZxq9ImuSB1iK4H-PXYSTeemrhUVZy-nDi2G_szIZcJgpgpk_hCSuEzBTacoLrwcdMLtG9gZIwJzqNxMpyyu6f4qcHZtrkwWmsbfKZ7eGr38tWsqPFXWV-kIcdkvXVQ85S7ZK3lshvFTdm0ACQY5LBBmg1o2o977spvuscWU8FQSFkBN4wrY_FrRbZqZrDj8rcri06I0SXPvXqe94r3H9iN__yCXbLdmJvelZsfe2RNl22y9QWEsE020d50cM37JHo0xkfL08NYcxcjW3r5mwduOazHHsaJ1HDDCtYbG2QA5AMyHdxMrod-U1fBL1gUzuHI8lQFRmkwPxItGWeMSaMotBvEUzOs4FEUq5yDRs9FoLmJQqUCQYWGwYsOSauclfqIeEJIQZksQiEMU0GRK8WoAJ-r0GBLmbhDLhYcz14cfEZm3Q6aZnHmuNEhbWTUkr5oPfk2Mp9kjphxHdJdDFTWSF2VCREjYhjDxy6pIC64ByJLPaurDB3cFGOB6Io-CXjuIUL5rejDOSLJwmw7_vP1z8nGcDJ6yB5ux_ddsmkT-G0G4wlpzV9rfQqmzDw_s1P4AwSn8GY |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+IEEE&rft.atitle=Off-line+compression+by+greedy+textual+substitution&rft.au=APOSTOLICO%2C+Alberto+SR&rft.au=LONARDI%2C+Stefano&rft.date=2000-11-01&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=0018-9219&rft.eissn=1558-2256&rft.volume=88&rft.issue=11&rft.spage=1733&rft.epage=1744&rft_id=info:doi/10.1109%2F5.892709&rft.externalDBID=n%2Fa&rft.externalDocID=875273 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9219&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9219&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9219&client=summon |