Improving Deep Learning Based Password Guessing Models Using Pre-processing

Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are sub...

Full description

Saved in:
Bibliographic Details
Published inInformation and Communications Security Vol. 13407; pp. 163 - 183
Main Authors Wu, Yuxuan, Wang, Ding, Zou, Yunkai, Huang, Ziyi
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are subject to the inherent limitations of overfitting and sparsity. With the improvement of computing power, deep-learning based models with higher crack rates are emerging. Since neural networks are generally used as black boxes for learning password features, a key challenge for deep-learning based password guessing models is to choose the appropriate preprocessing methods to learn more effective features. To fill the gap, this paper explores three new preprocessing methods and makes an attempt to apply them to two promising deep-learning networks, i.e., Long Short-Term Memory (LSTM) neural networks and Generative Adversarial Networks (GAN). First, we propose a character-feature based method for encoding to replace the canonical one-hot encoding. Second, we add so far the most comprehensive recognition rules of words, keyboard patterns, years, and website names into the basic PCFG, and find that the frequency distribution of extracted segments follows the Zipf’s law. Third, we adopt Xu et al.’s PCFG improvement with chunk segmentation at CCS’21, and study the performance of the Chunk+PCFG preprocessing method when applied to LSTM and GAN. Extensive experiments on six large real-world password datasets show the effectiveness of our preprocessing methods. Results show that within 50 million guesses: 1) When we apply the PCFG preprocessing method to PassGAN (a GAN-based password model proposed by Hitja et al. at ACNS’19), 13.83%–38.81% (26.79% on average) more passwords can be cracked; 2) Our LSTM based model using PCFG for preprocessing (short for PL) outperforms Wang et al.’s original PL model by 0.35%–3.94% (1.36% on average). Overall, our preprocessing methods can improve the attacking rates in four over seven tested cases. We believe this work provides new feasible directions for guessing optimization, and contributes to a better understanding of deep-learning based models.
AbstractList Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are subject to the inherent limitations of overfitting and sparsity. With the improvement of computing power, deep-learning based models with higher crack rates are emerging. Since neural networks are generally used as black boxes for learning password features, a key challenge for deep-learning based password guessing models is to choose the appropriate preprocessing methods to learn more effective features. To fill the gap, this paper explores three new preprocessing methods and makes an attempt to apply them to two promising deep-learning networks, i.e., Long Short-Term Memory (LSTM) neural networks and Generative Adversarial Networks (GAN). First, we propose a character-feature based method for encoding to replace the canonical one-hot encoding. Second, we add so far the most comprehensive recognition rules of words, keyboard patterns, years, and website names into the basic PCFG, and find that the frequency distribution of extracted segments follows the Zipf’s law. Third, we adopt Xu et al.’s PCFG improvement with chunk segmentation at CCS’21, and study the performance of the Chunk+PCFG preprocessing method when applied to LSTM and GAN. Extensive experiments on six large real-world password datasets show the effectiveness of our preprocessing methods. Results show that within 50 million guesses: 1) When we apply the PCFG preprocessing method to PassGAN (a GAN-based password model proposed by Hitja et al. at ACNS’19), 13.83%–38.81% (26.79% on average) more passwords can be cracked; 2) Our LSTM based model using PCFG for preprocessing (short for PL) outperforms Wang et al.’s original PL model by 0.35%–3.94% (1.36% on average). Overall, our preprocessing methods can improve the attacking rates in four over seven tested cases. We believe this work provides new feasible directions for guessing optimization, and contributes to a better understanding of deep-learning based models.
Author Zou, Yunkai
Wang, Ding
Huang, Ziyi
Wu, Yuxuan
Author_xml – sequence: 1
  givenname: Yuxuan
  surname: Wu
  fullname: Wu, Yuxuan
– sequence: 2
  givenname: Ding
  surname: Wang
  fullname: Wang, Ding
  email: wangding@nankai.edu.cn
– sequence: 3
  givenname: Yunkai
  surname: Zou
  fullname: Zou, Yunkai
– sequence: 4
  givenname: Ziyi
  surname: Huang
  fullname: Huang, Ziyi
BookMark eNo1kMtSwzAMRQ0Uhrb0D1jkBwyy5cT2kmfpUIYuygw7TxqLZ0mCHeD3cVpYeXTleyWdERvUTU2MHQs4EQD61GrDkQMKLnKtNS-cgB02wqRshMddNhSFEBxR2T02Sf__e4UdsCEgSG61wgM2EqhkDhKsPmSTGN8AQGqUKMyQ3c4-2tB8v9bP2SVRm82pDHVfnZeRfLYoY_xpgs-mXxRjr981ntYxe9gUi0A82att74jtP5XrSJO_d8yW11fLixs-v5_OLs7mvJUKOy4Ko1BpQ9L7woMpgPrdDCiUyq8QbSVKk1Oea1uQQUXaKr3yVBpvQeCYyW1sbEOaSsGtmuY9JkCuR-dSlEOXYLgNKdejSya1NaV1P9MtnaPeVVHdhXJdvZRtRyE6DVrLlCGKFKYt_gI4qm04
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2022
Copyright_xml – notice: Springer Nature Switzerland AG 2022
DBID FFUUA
DEWEY 005.8
DOI 10.1007/978-3-031-15777-6_10
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 303115777X
9783031157776
EISSN 1611-3349
Editor Li, Shujun
Alcaraz, Cristina
Chen, Liqun
Samarati, Pierangela
Editor_xml – sequence: 1
  fullname: Chen, Liqun
– sequence: 2
  fullname: Li, Shujun
– sequence: 3
  fullname: Alcaraz, Cristina
– sequence: 4
  fullname: Samarati, Pierangela
EndPage 183
ExternalDocumentID EBC7077207_160_179
GroupedDBID 38.
AABBV
AAZWU
ABSVR
ABTHU
ABVND
ACBPT
ACHZO
ACPMC
ADNVS
AEDXK
AEJLV
AEKFX
AHVRR
AIYYB
ALMA_UNASSIGNED_HOLDINGS
BBABE
CZZ
FFUUA
IEZ
SBO
TPJZQ
TSXQS
Z7Z
Z81
Z83
Z84
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RIG
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p243t-16843478e2dd6d0860e2097804324db339c1a85e55796e834e7947bdea8d9013
ISBN 9783031157769
3031157761
ISSN 0302-9743
IngestDate Tue Jul 29 20:12:40 EDT 2025
Fri Aug 15 21:41:29 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum QA76.9.D35
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-16843478e2dd6d0860e2097804324db339c1a85e55796e834e7947bdea8d9013
OCLC 1342502097
PQID EBC7077207_160_179
PageCount 21
ParticipantIDs springer_books_10_1007_978_3_031_15777_6_10
proquest_ebookcentralchapters_7077207_160_179
PublicationCentury 2000
PublicationDate 2022
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – year: 2022
  text: 2022
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 24th International Conference, ICICS 2022, Canterbury, UK, September 5-8, 2022, Proceedings
PublicationTitle Information and Communications Security
PublicationYear 2022
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Steffen, Bernhard
Bertino, Elisa
Goos, Gerhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002732318
ssj0002792
Score 2.0371034
Snippet Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to...
SourceID springer
proquest
SourceType Publisher
StartPage 163
SubjectTerms Deep learning
Generative Adversarial Networks
Long Short-Term Memory neural networks
Password
Preprocessing
Title Improving Deep Learning Based Password Guessing Models Using Pre-processing
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7077207&ppg=179&c=UERG
http://link.springer.com/10.1007/978-3-031-15777-6_10
Volume 13407
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELZKuSAOvMXykg_cKqMkTuLkwAHYRatlWXEosOJiOfF0VYHaapNIwK_gJzMT201T9rJcoiqyamc-azwznm-GsZeLNJM2kVbUUKCDogCEwa0jKnTESoMWgImJKPzxLD_-nJ6cZ-eTyZ-drKWurV7Vv6_klfwPqvgOcSWW7DWQ3f4pvsDfiC8-EWF87hm_4zCrTxfcEg_7G4AR1aPxgfR2iJh3vbbtfnbDfvjqg8WH4fyiCPLaD1x9N8sBdD_y2_LXcneXDTGJQ4BNqNZ6MXuLh6NF87ShHESLG5GybfE9tV770cxcosKnSxAbR1QICyCxQfP61N9snK3bPmFsFppPBF20G6xIkr1gRQhW7oU7h4jbyLvF05VKASnXyyWwvFCDow_klCI4pZ1TKUbpSp96RRx7tenO9Ng1y_nnuNjNEMHJBM2mRK6Js3dDFdmU3XxzdHL6ZRu1Q2MvkcM9VUTlF909lVsVsYfCqmNX32n4ih3m5lVTjnycvWv53tqZ32W3iQHDiZqC8rvHJrC6z-4ECLiH4AH7sEWfE_o8oM979HlAnwf0uUOf9-jzMfoP2fz90fzdsfDNOcQmSWUr4rxIZaoKSKgnGTrGESRRX84KTXRbSVnWsSkyyIjsDIVMATW_qiyYwqINKh-x6Wq9gseMGzx0Spsvqqi2aW4BDXajFkkJdZGbJDIHTAS56D6DwKct104KjVYRuoiR0nGODq0qD9gsCE_T8EaH0ty4PC01Sl33Utck9SfXGv2U3Rq29TM2bS87eI5WaVu98FvlL89whME
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Information+and+Communications+Security&rft.au=Wu%2C+Yuxuan&rft.au=Wang%2C+Ding&rft.au=Zou%2C+Yunkai&rft.au=Huang%2C+Ziyi&rft.atitle=Improving+Deep+Learning+Based+Password+Guessing+Models+Using+Pre-processing&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783031157769&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=163&rft.epage=183&rft_id=info:doi/10.1007%2F978-3-031-15777-6_10
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7077207-l.jpg