Combinatorics of minimal absent words for a sliding window
A string w is called a minimal absent word (MAW) for another string T if w does not occur in T but the proper substrings of w occur in T. For example, let Σ={a,b,c} be the alphabet. Then, the set of MAWs for string w=abaab is {aaa,aaba,bab,bb,c}. In this paper, we study combinatorial properties of M...
Saved in:
Published in | Theoretical computer science Vol. 927; pp. 109 - 119 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
26.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A string w is called a minimal absent word (MAW) for another string T if w does not occur in T but the proper substrings of w occur in T. For example, let Σ={a,b,c} be the alphabet. Then, the set of MAWs for string w=abaab is {aaa,aaba,bab,bb,c}. In this paper, we study combinatorial properties of MAWs in the sliding window model, namely, how the set of MAWs changes when a sliding window of fixed length d is shifted over the input string T of length n, where 1≤d<n. We present tight upper and lower bounds on the maximum number of changes in the set of MAWs for a sliding window over T, both in the cases of general alphabets and binary alphabets. Our bounds improve on the previously known best bounds [Crochemore et al., 2020]. |
---|---|
AbstractList | A string w is called a minimal absent word (MAW) for another string T if w does not occur in T but the proper substrings of w occur in T. For example, let Σ={a,b,c} be the alphabet. Then, the set of MAWs for string w=abaab is {aaa,aaba,bab,bb,c}. In this paper, we study combinatorial properties of MAWs in the sliding window model, namely, how the set of MAWs changes when a sliding window of fixed length d is shifted over the input string T of length n, where 1≤d<n. We present tight upper and lower bounds on the maximum number of changes in the set of MAWs for a sliding window over T, both in the cases of general alphabets and binary alphabets. Our bounds improve on the previously known best bounds [Crochemore et al., 2020]. |
Author | Akagi, Tooru Nakashima, Yuto Takeda, Masayuki Kuhara, Yuki Mieno, Takuya Bannai, Hideo Inenaga, Shunsuke |
Author_xml | – sequence: 1 givenname: Tooru surname: Akagi fullname: Akagi, Tooru email: toru.akagi@inf.kyushu-u.ac.jp organization: Department of Informatics, Kyushu University, Japan – sequence: 2 givenname: Yuki surname: Kuhara fullname: Kuhara, Yuki organization: Department of Informatics, Kyushu University, Japan – sequence: 3 givenname: Takuya surname: Mieno fullname: Mieno, Takuya email: tmieno@uec.ac.jp organization: Department of Informatics, Kyushu University, Japan – sequence: 4 givenname: Yuto surname: Nakashima fullname: Nakashima, Yuto email: yuto.nakashima@inf.kyushu-u.ac.jp organization: Department of Informatics, Kyushu University, Japan – sequence: 5 givenname: Shunsuke surname: Inenaga fullname: Inenaga, Shunsuke email: inenaga@inf.kyushu-u.ac.jp organization: Department of Informatics, Kyushu University, Japan – sequence: 6 givenname: Hideo orcidid: 0000-0002-6856-5185 surname: Bannai fullname: Bannai, Hideo email: hdbn.dsc@tmd.ac.jp organization: M&D Data Science Center, Tokyo Medical and Dental University, Japan – sequence: 7 givenname: Masayuki surname: Takeda fullname: Takeda, Masayuki email: takeda@inf.kyushu-u.ac.jp organization: Department of Informatics, Kyushu University, Japan |
BookMark | eNp9j81KAzEUhYNUsK0-gLu8wIw3N5PMjK6kaBUKbnQdMvmRlE4iyWDx7Z1S157NWX2H863IIqboCLllUDNg8m5fT6bUCIg1yBoAL8iSdW1fIfbNgiyBQ1PxvhVXZFXKHuaIVi7J_SaNQ4h6SjmYQpOnY4hh1Aeqh-LiRI8p20J9ylTTcgg2xE96DNGm4zW59PpQ3M1fr8nH89P75qXavW1fN4-7ynDRTZXBwaPQvjXeN7YT4FA4bUE3XiOibQcGvRMahLai57aXGoBrxyXKAVnD14Sdd01OpWTn1VeeH-YfxUCd5NVezfLqJK9Aqll-Zh7OjJuPfQeXVTHBReNsyM5MyqbwD_0LIw1kQA |
Cites_doi | 10.1016/S0020-0190(98)00104-5 10.1016/j.ic.2019.104461 10.1186/s13015-017-0094-z 10.1137/0222058 10.1016/0304-3975(85)90157-4 10.1093/nar/gkab139 10.1016/j.tcs.2012.04.031 10.1109/5.892711 10.1016/j.ic.2018.06.002 10.1186/s12859-014-0388-9 10.1093/bioinformatics/btaa686 |
ContentType | Journal Article |
Copyright | 2022 The Author(s) |
Copyright_xml | – notice: 2022 The Author(s) |
DBID | 6I. AAFTH AAYXX CITATION |
DOI | 10.1016/j.tcs.2022.06.002 |
DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Mathematics Computer Science |
EISSN | 1879-2294 |
EndPage | 119 |
ExternalDocumentID | 10_1016_j_tcs_2022_06_002 S0304397522003553 |
GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 4.4 457 4G. 5VS 6I. 7-5 71M 8P~ 9JN AABNK AACTN AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXUO AAYFN ABAOU ABBOA ABJNI ABMAC ABYKQ ACAZW ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ARUGR AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FEDTE FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF IHE IXB J1W KOM LG9 M26 M41 MHUIS MO0 N9A O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. Q38 ROL RPZ SCC SDF SDG SES SPC SPCBC SSV SSW T5K TN5 WH7 YNT ZMT ~G- 29Q AAEDT AAQXK AATTM AAXKI AAYWO AAYXX ABDPE ABEFU ABFNM ABWVN ABXDB ACNNM ACRPL ACVFH ADCNI ADMUD ADNMO ADVLN AEIPS AEUPX AEXQZ AFJKZ AFPUW AFXIZ AGCQF AGHFR AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN BNPGV CITATION EJD FGOYB G-2 HZ~ R2- RIG SEW SSH SSZ TAE WUQ XJT ZY4 |
ID | FETCH-LOGICAL-c358t-c2bf25af7cff4d850e25ead0a4fa222d7b109e5a05ad593d96a003ae3626b2143 |
IEDL.DBID | .~1 |
ISSN | 0304-3975 |
IngestDate | Tue Jul 01 03:43:10 EDT 2025 Fri Feb 23 02:40:34 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Minimal absent words Combinatorics on words Sliding window |
Language | English |
License | This is an open access article under the CC BY license. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c358t-c2bf25af7cff4d850e25ead0a4fa222d7b109e5a05ad593d96a003ae3626b2143 |
ORCID | 0000-0002-6856-5185 |
OpenAccessLink | https://www.sciencedirect.com/science/article/pii/S0304397522003553 |
PageCount | 11 |
ParticipantIDs | crossref_primary_10_1016_j_tcs_2022_06_002 elsevier_sciencedirect_doi_10_1016_j_tcs_2022_06_002 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-08-26 |
PublicationDateYYYYMMDD | 2022-08-26 |
PublicationDate_xml | – month: 08 year: 2022 text: 2022-08-26 day: 26 |
PublicationDecade | 2020 |
PublicationTitle | Theoretical computer science |
PublicationYear | 2022 |
Publisher | Elsevier B.V |
Publisher_xml | – name: Elsevier B.V |
References | Manber, Myers (br0150) 1993; 22 Fujishige, Tsujimaru, Inenaga, Bannai, Takeda (br0100) 2016; vol. 58 Charalampopoulos, Crochemore, Fici, Mercas, Pissis (br0060) 2018; 262 Crochemore, Héliou, Kucherov, Mouchard, Pissis, Ramusat (br0180) 2020; 270 Almirantis, Charalampopoulos, Gao, Iliopoulos, Mohamed, Pissis, Polychronopoulos (br0050) 2017; 12 Fici, Gawrychowski (br0170) 2019 Barton, Heliou, Mouchard, Pissis (br0140) 2014; 15 Pratas, Silva (br0070) 2020; 36 Crochemore, Navarro (br0030) 2002 Crawford, Badkobeh, Lewis (br0040) 2018 Koulouras, Frith (br0080) 2021; 49 Belazzougui, Cunial, Kärkkäinen, Mäkinen (br0130) 2013 Chairungsee, Crochemore (br0010) 2012; 450 Crochemore, Mignosi, Restivo, Salemi (br0020) 2000; 88 Blumer, Blumer, Haussler, Ehrenfeucht, Chen, Seiferas (br0110) 1985; 40 Crochemore, Mignosi, Restivo (br0090) 1998; 67 Charalampopoulos, Crochemore, Pissis (br0120) 2018 Barton, Heliou, Mouchard, Pissis (br0160) 2016 Mieno, Kuhara, Akagi, Fujishige, Nakashima, Inenaga, Bannai, Takeda (br0190) 2020; vol. 12011 Almirantis (10.1016/j.tcs.2022.06.002_br0050) 2017; 12 Fici (10.1016/j.tcs.2022.06.002_br0170) 2019 Mieno (10.1016/j.tcs.2022.06.002_br0190) 2020; vol. 12011 Fujishige (10.1016/j.tcs.2022.06.002_br0100) 2016; vol. 58 Manber (10.1016/j.tcs.2022.06.002_br0150) 1993; 22 Crochemore (10.1016/j.tcs.2022.06.002_br0180) 2020; 270 Crawford (10.1016/j.tcs.2022.06.002_br0040) 2018 Blumer (10.1016/j.tcs.2022.06.002_br0110) 1985; 40 Barton (10.1016/j.tcs.2022.06.002_br0140) 2014; 15 Crochemore (10.1016/j.tcs.2022.06.002_br0030) 2002 Barton (10.1016/j.tcs.2022.06.002_br0160) 2016 Pratas (10.1016/j.tcs.2022.06.002_br0070) 2020; 36 Chairungsee (10.1016/j.tcs.2022.06.002_br0010) 2012; 450 Koulouras (10.1016/j.tcs.2022.06.002_br0080) 2021; 49 Crochemore (10.1016/j.tcs.2022.06.002_br0020) 2000; 88 Belazzougui (10.1016/j.tcs.2022.06.002_br0130) 2013 Crochemore (10.1016/j.tcs.2022.06.002_br0090) 1998; 67 Charalampopoulos (10.1016/j.tcs.2022.06.002_br0060) 2018; 262 Charalampopoulos (10.1016/j.tcs.2022.06.002_br0120) 2018 |
References_xml | – start-page: 152 year: 2019 end-page: 161 ident: br0170 article-title: Minimal absent words in rooted and unrooted trees publication-title: SPIRE 2019 – volume: vol. 12011 start-page: 148 year: 2020 end-page: 160 ident: br0190 article-title: Minimal unique substrings and minimal absent words in a sliding window publication-title: SOFSEM 2020 – volume: 12 start-page: 5 year: 2017 ident: br0050 article-title: On avoided words, absent words, and their application to biological sequence analysis publication-title: Algorithms Mol. Biol. – volume: 15 start-page: 388 year: 2014 ident: br0140 article-title: Linear-time computation of minimal absent words using suffix array publication-title: BMC Bioinform. – volume: 67 start-page: 111 year: 1998 end-page: 117 ident: br0090 article-title: Automata and forbidden words publication-title: Inf. Process. Lett. – volume: 262 start-page: 57 year: 2018 end-page: 68 ident: br0060 article-title: Alignment-free sequence comparison using absent words publication-title: Inf. Comput. – start-page: 131 year: 2018 end-page: 138 ident: br0120 article-title: On extended special factors of a word publication-title: SPIRE 2018 – start-page: 133 year: 2013 end-page: 144 ident: br0130 article-title: Versatile succinct representations of the bidirectional Burrows-Wheeler transform publication-title: ESA 2013 – volume: 270 year: 2020 ident: br0180 article-title: Absent words in a sliding window with applications publication-title: Inf. Comput. – start-page: 233 year: 2018 end-page: 239 ident: br0040 article-title: Searching page-images of early music scanned with OMR: a scalable solution using minimal absent words publication-title: ISMIR 2018 – volume: 40 start-page: 31 year: 1985 end-page: 55 ident: br0110 article-title: The smallest automaton recognizing the subwords of a text publication-title: Theor. Comput. Sci. – volume: 49 start-page: 3139 year: 2021 end-page: 3155 ident: br0080 article-title: Significant non-existence of sequences in genomes and proteomes publication-title: Nucleic Acids Res. – volume: 22 start-page: 935 year: 1993 end-page: 948 ident: br0150 article-title: Suffix arrays: a new method for on-line string searches publication-title: SIAM J. Comput. – volume: 450 start-page: 109 year: 2012 end-page: 116 ident: br0010 article-title: Using minimal absent words to build phylogeny publication-title: Theor. Comput. Sci. – volume: 88 start-page: 1756 year: 2000 end-page: 1768 ident: br0020 article-title: Data compression using antidictionaries publication-title: Proc. IEEE – volume: 36 start-page: 5129 year: 2020 end-page: 5132 ident: br0070 article-title: Persistent minimal sequences of sars-cov-2 publication-title: Bioinformatics – start-page: 7 year: 2002 end-page: 13 ident: br0030 article-title: Improved antidictionary based compression publication-title: 12th International Conference of the Chilean Computer Science Society, 2002. Proceedings – start-page: 243 year: 2016 end-page: 253 ident: br0160 article-title: Parallelising the computation of minimal absent words publication-title: PPAM 2015 – volume: vol. 58 start-page: 38:1 year: 2016 end-page: 38:14 ident: br0100 article-title: Computing DAWGs and minimal absent words in linear time for integer alphabets publication-title: MFCS 2016 – volume: 67 start-page: 111 issue: 3 year: 1998 ident: 10.1016/j.tcs.2022.06.002_br0090 article-title: Automata and forbidden words publication-title: Inf. Process. Lett. doi: 10.1016/S0020-0190(98)00104-5 – start-page: 133 year: 2013 ident: 10.1016/j.tcs.2022.06.002_br0130 article-title: Versatile succinct representations of the bidirectional Burrows-Wheeler transform – volume: vol. 58 start-page: 38:1 year: 2016 ident: 10.1016/j.tcs.2022.06.002_br0100 article-title: Computing DAWGs and minimal absent words in linear time for integer alphabets – volume: 270 year: 2020 ident: 10.1016/j.tcs.2022.06.002_br0180 article-title: Absent words in a sliding window with applications publication-title: Inf. Comput. doi: 10.1016/j.ic.2019.104461 – volume: 12 start-page: 5 issue: 1 year: 2017 ident: 10.1016/j.tcs.2022.06.002_br0050 article-title: On avoided words, absent words, and their application to biological sequence analysis publication-title: Algorithms Mol. Biol. doi: 10.1186/s13015-017-0094-z – start-page: 131 year: 2018 ident: 10.1016/j.tcs.2022.06.002_br0120 article-title: On extended special factors of a word – volume: 22 start-page: 935 issue: 5 year: 1993 ident: 10.1016/j.tcs.2022.06.002_br0150 article-title: Suffix arrays: a new method for on-line string searches publication-title: SIAM J. Comput. doi: 10.1137/0222058 – volume: 40 start-page: 31 year: 1985 ident: 10.1016/j.tcs.2022.06.002_br0110 article-title: The smallest automaton recognizing the subwords of a text publication-title: Theor. Comput. Sci. doi: 10.1016/0304-3975(85)90157-4 – volume: 49 start-page: 3139 issue: 6 year: 2021 ident: 10.1016/j.tcs.2022.06.002_br0080 article-title: Significant non-existence of sequences in genomes and proteomes publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkab139 – start-page: 243 year: 2016 ident: 10.1016/j.tcs.2022.06.002_br0160 article-title: Parallelising the computation of minimal absent words – volume: 450 start-page: 109 year: 2012 ident: 10.1016/j.tcs.2022.06.002_br0010 article-title: Using minimal absent words to build phylogeny publication-title: Theor. Comput. Sci. doi: 10.1016/j.tcs.2012.04.031 – start-page: 7 year: 2002 ident: 10.1016/j.tcs.2022.06.002_br0030 article-title: Improved antidictionary based compression – volume: 88 start-page: 1756 issue: 11 year: 2000 ident: 10.1016/j.tcs.2022.06.002_br0020 article-title: Data compression using antidictionaries publication-title: Proc. IEEE doi: 10.1109/5.892711 – volume: 262 start-page: 57 year: 2018 ident: 10.1016/j.tcs.2022.06.002_br0060 article-title: Alignment-free sequence comparison using absent words publication-title: Inf. Comput. doi: 10.1016/j.ic.2018.06.002 – start-page: 233 year: 2018 ident: 10.1016/j.tcs.2022.06.002_br0040 article-title: Searching page-images of early music scanned with OMR: a scalable solution using minimal absent words – volume: 15 start-page: 388 issue: 1 year: 2014 ident: 10.1016/j.tcs.2022.06.002_br0140 article-title: Linear-time computation of minimal absent words using suffix array publication-title: BMC Bioinform. doi: 10.1186/s12859-014-0388-9 – volume: 36 start-page: 5129 issue: 21 year: 2020 ident: 10.1016/j.tcs.2022.06.002_br0070 article-title: Persistent minimal sequences of sars-cov-2 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa686 – start-page: 152 year: 2019 ident: 10.1016/j.tcs.2022.06.002_br0170 article-title: Minimal absent words in rooted and unrooted trees – volume: vol. 12011 start-page: 148 year: 2020 ident: 10.1016/j.tcs.2022.06.002_br0190 article-title: Minimal unique substrings and minimal absent words in a sliding window |
SSID | ssj0000576 |
Score | 2.3871553 |
Snippet | A string w is called a minimal absent word (MAW) for another string T if w does not occur in T but the proper substrings of w occur in T. For example, let... |
SourceID | crossref elsevier |
SourceType | Index Database Publisher |
StartPage | 109 |
SubjectTerms | Combinatorics on words Minimal absent words Sliding window |
Title | Combinatorics of minimal absent words for a sliding window |
URI | https://dx.doi.org/10.1016/j.tcs.2022.06.002 |
Volume | 927 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT8MwDLamcYEDjwFiPKYcOCGVtWnSdtymiWmAtgtM2q1KmkQaEtvEinbjt2P3wUOIC721SqrKce3Psf0F4FLHRgSBc57hocYAxUovkSbzXCYRLysZqph6h8eTaDQV9zM5a8Cg7oWhssrK9pc2vbDW1ZNuJc3uaj7vPlJSD70pAghKh0li_BQiJi2_fv8q80A8UuYrKQOAo-vMZlHjlWfE2M15QeFZ7az88k3f_M1wH3YroMj65bccQMMuWrBXH8LAqn-yBTvjT-LV9SHc4ACMdVVB_bFmS8eIO-QFX6Q0tRmxDQaba4ZIlSmGEJM8F9tgXL7cHMF0ePs0GHnV8QheFsok9zKuHZfKxZlzwiTSt1yiXvhKOIVe38Q68HtWKl8qI3uh6UUKBaUsEdBojjjpGJqL5cKeAHN4icQXLpJWiDBURF8cB8rXZBCzpA1XtWDSVcmCkdblYc8pSjElKaZFiRxvg6hFl_5YyhSt9N_TTv837Qy26Y62eXl0Ds389c1eIE7IdadQhA5s9e8eRpMPyLK8GQ |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED6VdgAGHgVEeXpgQoqaOHaSslUVVUofC63UzXISWyoSbUWD-vc5Jw4PIRYyJnEUXZzvvvPdfQa4S8KMeZ7WTkb9BAMUxZ2IZ6mjU458WXJfhqZ3eDwJ4hl7mvN5DXpVL4wpq7TYX2J6gdb2TNtas71eLNrPJqmH3hQJhEmHcX8HGkaditeh0R0M48kXIPOwTFmaJAAOqJKbRZlXnhrRbkoLFU-7uPLLPX1zOf0jOLBckXTL1zmGmlo24bDah4HY37IJ--NP7dXNCTzgDRjuykL9Y0NWmhj5kFd8kExMpxHZYry5IUhWiSTIMo3zIlsMzVfbU5j1H6e92LE7JDipz6PcSWmiKZc6TLVmWcRdRTlODVcyLdHxZ2HiuR3Fpctlxjt-1gkk2koqo0GTUKRKZ1BfrpbqHIjGg0Uu0wFXjPm-NArGoSfdxGBiGrXgvjKMWJdCGKKqEHsRaEVhrCiKKjnaAlaZTvz4mgKB-u9hF_8bdgu78XQ8EqPBZHgJe-aKWfWlwRXU87d3dY20IU9u7LT4AFdTvso |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Combinatorics+of+minimal+absent+words+for+a+sliding+window&rft.jtitle=Theoretical+computer+science&rft.au=Akagi%2C+Tooru&rft.au=Kuhara%2C+Yuki&rft.au=Mieno%2C+Takuya&rft.au=Nakashima%2C+Yuto&rft.date=2022-08-26&rft.issn=0304-3975&rft.volume=927&rft.spage=109&rft.epage=119&rft_id=info:doi/10.1016%2Fj.tcs.2022.06.002&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_tcs_2022_06_002 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0304-3975&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0304-3975&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0304-3975&client=summon |