Detection of Sequential Outliers Using a Variable Length Markov Model
The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain gree...
Saved in:
Published in | 2008 Seventh International Conference on Machine Learning and Applications pp. 571 - 576 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2008
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a probabilistic suffix tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasibility of our approach through a set of experiments over a protein database. |
---|---|
AbstractList | The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a probabilistic suffix tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasibility of our approach through a set of experiments over a protein database. |
Author | Laurent, A. Low-Kam, C. Teisseire, M. |
Author_xml | – sequence: 1 givenname: C. surname: Low-Kam fullname: Low-Kam, C. organization: Inst. de Math. et Modelisation de Montpellier, Univ. Montpellier 2, Montpellier – sequence: 2 givenname: A. surname: Laurent fullname: Laurent, A. organization: Lab. d'Inf. de Robot. et de Microelectron. de Montpellier, Univ. Montpellier 2, Montpellier – sequence: 3 givenname: M. surname: Teisseire fullname: Teisseire, M. organization: Lab. d'Inf. de Robot. et de Microelectron. de Montpellier, Univ. Montpellier 2, Montpellier |
BookMark | eNotjL1OwzAURi1BJWjpysLiF0i4juO_sQoFKiXqAGWt7PimNYQEEheJtycIPunoSGf45uS86zsk5JpByhiY201Rlas0A9Ap4-qMzEFJI3g-MSPz325AC8YvyHIcXwGAGamY0JdkfYcR6xj6jvYNfcLPE3Yx2JZuT7ENOIx0N4buQC19sUOwrkVaYneIR1rZ4a3_olXvsb0is8a2Iy7_vSC7-_Vz8ZiU24dNsSqTYyZYTERdM6_R6sY7sHnGpZbeTRO5UVMFw0Fr7RznIoeGofeNVIJrKTMmc8cX5ObvNyDi_mMI73b43ucqE8AZ_wHdEEwv |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICMLA.2008.137 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EndPage | 576 |
ExternalDocumentID | 4725031 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IERZE OCL RIB RIC RIE RIL |
ID | FETCH-LOGICAL-h251t-5cc1d8ea8fdb0a423686dbbbb5497a8f0930888bb33540f1eddf67538662164b3 |
IEDL.DBID | RIE |
ISBN | 0769534953 9780769534954 |
IngestDate | Wed Jun 26 19:22:11 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
LCCN | 2008908513 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-h251t-5cc1d8ea8fdb0a423686dbbbb5497a8f0930888bb33540f1eddf67538662164b3 |
OpenAccessLink | https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324526 |
PageCount | 6 |
ParticipantIDs | ieee_primary_4725031 |
PublicationCentury | 2000 |
PublicationDate | 2008-Dec. |
PublicationDateYYYYMMDD | 2008-12-01 |
PublicationDate_xml | – month: 12 year: 2008 text: 2008-Dec. |
PublicationDecade | 2000 |
PublicationTitle | 2008 Seventh International Conference on Machine Learning and Applications |
PublicationTitleAbbrev | ICMLA |
PublicationYear | 2008 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001967158 |
Score | 1.4518299 |
Snippet | The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 571 |
SubjectTerms | Concentration Inequality Data analysis DNA Genetic mutations Information Criterion Machine learning Outliers Proteins Robots Sequences Sequential Databases Size measurement Testing |
Title | Detection of Sequential Outliers Using a Variable Length Markov Model |
URI | https://ieeexplore.ieee.org/document/4725031 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhJG8eJY4-otCqoBSQo6lbFsU0logShhIFfz9lJW4EYyOR4shzZ9-7y3j2ELn0pmQmp9hSjgRcmUeJJkRJPGk2JoYB4nX_K7J5N5uHdIlq00NVGC6O1duQz3bdD9y9fFWllS2WDMIaAbUXTO9wPaq3Wtp4iWEwiXmfmIqKWN9k02Fm_h03TRuKLwe1wNr2uqZSE_rRWcZFl3EGz9ZpqQslbvyplP_361a7xv4veQ72thg8_bqLTPmrp_AB11iYOuDnTXTS60aWjY-W4MPjJMavh1Gf4oSoz65ONHasAJ_gF0mortMJTnb-WK2xlPsUntm5qWQ_Nx6Pn4cRrvBW8FSCa0ovSlCiuE26U9BPAVIwzJeGBfDGGWV9QuH-4lNQWhgzRShnILShnLIAMS9JD1M6LXB8hrFREWKyFCA3cCGnADZchoJ6EGoBzcXyMunZXlu91-4xlsyEnf0-fol1HyXCMkTPULj8qfQ5xv5QX7oN_A3lnqHM |
link.rule.ids | 310,311,786,790,795,796,802,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG6WedDT1M342x48ygYrlHI0c8umME3czG4Lpa1LJGAMePCv97WwLRoPcoImJE1J-773-L73IXRtc06VS6QlKOlbbuzFFg8Sx-JKEkcRQLzGPyWa0vHcvV94iwa62WhhpJSGfCa7-tb8yxd5UupSWc_1IWBr0fQOxHk7qNRa24pKQH3HY1VuHnhEMyfrFjvrZ7du2wgv9yaDKLytyJQO-WmuYmLLqIWi9awqSslbtyx4N_n61bDxv9PeR52tig8_beLTAWrI7BC11jYOuN7VbTS8k4UhZGU4V_jZcKth36f4sSxS7ZSNDa8Ax_gFEmsttcKhzF6LFdZCn_wTaz-1tIPmo-FsMLZqdwVrBZimsLwkcQSTMVOC2zGgKsqo4HBBxujDqB0QOIEY50SXhpQjhVCQXRBGaR9yLE6OUDPLM3mMsBCeQ30ZBK6CMyHpM8W4C7gnJgoAne-foLZeleV71UBjWS_I6d_DV2h3PIvCZTiZPpyhPUPQMPyRc9QsPkp5ASig4Jfm438Dk4GryQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2008+Seventh+International+Conference+on+Machine+Learning+and+Applications&rft.atitle=Detection+of+Sequential+Outliers+Using+a+Variable+Length+Markov+Model&rft.au=Low-Kam%2C+C.&rft.au=Laurent%2C+A.&rft.au=Teisseire%2C+M.&rft.date=2008-12-01&rft.pub=IEEE&rft.isbn=9780769534954&rft.spage=571&rft.epage=576&rft_id=info:doi/10.1109%2FICMLA.2008.137&rft.externalDocID=4725031 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769534954/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769534954/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769534954/sc.gif&client=summon&freeimage=true |