Detection of Sequential Outliers Using a Variable Length Markov Model

The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain gree...

Full description

Saved in:
Bibliographic Details
Published in2008 Seventh International Conference on Machine Learning and Applications pp. 571 - 576
Main Authors Low-Kam, C., Laurent, A., Teisseire, M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2008
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a probabilistic suffix tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasibility of our approach through a set of experiments over a protein database.
AbstractList The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a probabilistic suffix tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasibility of our approach through a set of experiments over a protein database.
Author Laurent, A.
Low-Kam, C.
Teisseire, M.
Author_xml – sequence: 1
  givenname: C.
  surname: Low-Kam
  fullname: Low-Kam, C.
  organization: Inst. de Math. et Modelisation de Montpellier, Univ. Montpellier 2, Montpellier
– sequence: 2
  givenname: A.
  surname: Laurent
  fullname: Laurent, A.
  organization: Lab. d'Inf. de Robot. et de Microelectron. de Montpellier, Univ. Montpellier 2, Montpellier
– sequence: 3
  givenname: M.
  surname: Teisseire
  fullname: Teisseire, M.
  organization: Lab. d'Inf. de Robot. et de Microelectron. de Montpellier, Univ. Montpellier 2, Montpellier
BookMark eNotjL1OwzAURi1BJWjpysLiF0i4juO_sQoFKiXqAGWt7PimNYQEEheJtycIPunoSGf45uS86zsk5JpByhiY201Rlas0A9Ap4-qMzEFJI3g-MSPz325AC8YvyHIcXwGAGamY0JdkfYcR6xj6jvYNfcLPE3Yx2JZuT7ENOIx0N4buQC19sUOwrkVaYneIR1rZ4a3_olXvsb0is8a2Iy7_vSC7-_Vz8ZiU24dNsSqTYyZYTERdM6_R6sY7sHnGpZbeTRO5UVMFw0Fr7RznIoeGofeNVIJrKTMmc8cX5ObvNyDi_mMI73b43ucqE8AZ_wHdEEwv
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICMLA.2008.137
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 576
ExternalDocumentID 4725031
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IERZE
OCL
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-h251t-5cc1d8ea8fdb0a423686dbbbb5497a8f0930888bb33540f1eddf67538662164b3
IEDL.DBID RIE
ISBN 0769534953
9780769534954
IngestDate Wed Jun 26 19:22:11 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
LCCN 2008908513
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-h251t-5cc1d8ea8fdb0a423686dbbbb5497a8f0930888bb33540f1eddf67538662164b3
OpenAccessLink https://hal-lirmm.ccsd.cnrs.fr/lirmm-00324526
PageCount 6
ParticipantIDs ieee_primary_4725031
PublicationCentury 2000
PublicationDate 2008-Dec.
PublicationDateYYYYMMDD 2008-12-01
PublicationDate_xml – month: 12
  year: 2008
  text: 2008-Dec.
PublicationDecade 2000
PublicationTitle 2008 Seventh International Conference on Machine Learning and Applications
PublicationTitleAbbrev ICMLA
PublicationYear 2008
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001967158
Score 1.4518299
Snippet The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of...
SourceID ieee
SourceType Publisher
StartPage 571
SubjectTerms Concentration Inequality
Data analysis
DNA
Genetic mutations
Information Criterion
Machine learning
Outliers
Proteins
Robots
Sequences
Sequential Databases
Size measurement
Testing
Title Detection of Sequential Outliers Using a Variable Length Markov Model
URI https://ieeexplore.ieee.org/document/4725031
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhJG8eJY4-otCqoBSQo6lbFsU0logShhIFfz9lJW4EYyOR4shzZ9-7y3j2ELn0pmQmp9hSjgRcmUeJJkRJPGk2JoYB4nX_K7J5N5uHdIlq00NVGC6O1duQz3bdD9y9fFWllS2WDMIaAbUXTO9wPaq3Wtp4iWEwiXmfmIqKWN9k02Fm_h03TRuKLwe1wNr2uqZSE_rRWcZFl3EGz9ZpqQslbvyplP_361a7xv4veQ72thg8_bqLTPmrp_AB11iYOuDnTXTS60aWjY-W4MPjJMavh1Gf4oSoz65ONHasAJ_gF0mortMJTnb-WK2xlPsUntm5qWQ_Nx6Pn4cRrvBW8FSCa0ovSlCiuE26U9BPAVIwzJeGBfDGGWV9QuH-4lNQWhgzRShnILShnLIAMS9JD1M6LXB8hrFREWKyFCA3cCGnADZchoJ6EGoBzcXyMunZXlu91-4xlsyEnf0-fol1HyXCMkTPULj8qfQ5xv5QX7oN_A3lnqHM
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwFG6WedDT1M342x48ygYrlHI0c8umME3czG4Lpa1LJGAMePCv97WwLRoPcoImJE1J-773-L73IXRtc06VS6QlKOlbbuzFFg8Sx-JKEkcRQLzGPyWa0vHcvV94iwa62WhhpJSGfCa7-tb8yxd5UupSWc_1IWBr0fQOxHk7qNRa24pKQH3HY1VuHnhEMyfrFjvrZ7du2wgv9yaDKLytyJQO-WmuYmLLqIWi9awqSslbtyx4N_n61bDxv9PeR52tig8_beLTAWrI7BC11jYOuN7VbTS8k4UhZGU4V_jZcKth36f4sSxS7ZSNDa8Ax_gFEmsttcKhzF6LFdZCn_wTaz-1tIPmo-FsMLZqdwVrBZimsLwkcQSTMVOC2zGgKsqo4HBBxujDqB0QOIEY50SXhpQjhVCQXRBGaR9yLE6OUDPLM3mMsBCeQ30ZBK6CMyHpM8W4C7gnJgoAne-foLZeleV71UBjWS_I6d_DV2h3PIvCZTiZPpyhPUPQMPyRc9QsPkp5ASig4Jfm438Dk4GryQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2008+Seventh+International+Conference+on+Machine+Learning+and+Applications&rft.atitle=Detection+of+Sequential+Outliers+Using+a+Variable+Length+Markov+Model&rft.au=Low-Kam%2C+C.&rft.au=Laurent%2C+A.&rft.au=Teisseire%2C+M.&rft.date=2008-12-01&rft.pub=IEEE&rft.isbn=9780769534954&rft.spage=571&rft.epage=576&rft_id=info:doi/10.1109%2FICMLA.2008.137&rft.externalDocID=4725031
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769534954/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769534954/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769534954/sc.gif&client=summon&freeimage=true