Improving Deep Learning Based Password Guessing Models Using Pre-processing

Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are sub...

Full description

Saved in:

Bibliographic Details
Published in	Information and Communications Security Vol. 13407; pp. 163 - 183
Main Authors	Wu, Yuxuan, Wang, Ding, Zou, Yunkai, Huang, Ziyi
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2022 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Deep learning Generative Adversarial Networks Long Short-Term Memory neural networks Password Preprocessing
Online Access	Get full text

Cover

Loading…

Abstract	Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are subject to the inherent limitations of overfitting and sparsity. With the improvement of computing power, deep-learning based models with higher crack rates are emerging. Since neural networks are generally used as black boxes for learning password features, a key challenge for deep-learning based password guessing models is to choose the appropriate preprocessing methods to learn more effective features. To fill the gap, this paper explores three new preprocessing methods and makes an attempt to apply them to two promising deep-learning networks, i.e., Long Short-Term Memory (LSTM) neural networks and Generative Adversarial Networks (GAN). First, we propose a character-feature based method for encoding to replace the canonical one-hot encoding. Second, we add so far the most comprehensive recognition rules of words, keyboard patterns, years, and website names into the basic PCFG, and find that the frequency distribution of extracted segments follows the Zipf’s law. Third, we adopt Xu et al.’s PCFG improvement with chunk segmentation at CCS’21, and study the performance of the Chunk+PCFG preprocessing method when applied to LSTM and GAN. Extensive experiments on six large real-world password datasets show the effectiveness of our preprocessing methods. Results show that within 50 million guesses: 1) When we apply the PCFG preprocessing method to PassGAN (a GAN-based password model proposed by Hitja et al. at ACNS’19), 13.83%–38.81% (26.79% on average) more passwords can be cracked; 2) Our LSTM based model using PCFG for preprocessing (short for PL) outperforms Wang et al.’s original PL model by 0.35%–3.94% (1.36% on average). Overall, our preprocessing methods can improve the attacking rates in four over seven tested cases. We believe this work provides new feasible directions for guessing optimization, and contributes to a better understanding of deep-learning based models.
AbstractList	Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are subject to the inherent limitations of overfitting and sparsity. With the improvement of computing power, deep-learning based models with higher crack rates are emerging. Since neural networks are generally used as black boxes for learning password features, a key challenge for deep-learning based password guessing models is to choose the appropriate preprocessing methods to learn more effective features. To fill the gap, this paper explores three new preprocessing methods and makes an attempt to apply them to two promising deep-learning networks, i.e., Long Short-Term Memory (LSTM) neural networks and Generative Adversarial Networks (GAN). First, we propose a character-feature based method for encoding to replace the canonical one-hot encoding. Second, we add so far the most comprehensive recognition rules of words, keyboard patterns, years, and website names into the basic PCFG, and find that the frequency distribution of extracted segments follows the Zipf’s law. Third, we adopt Xu et al.’s PCFG improvement with chunk segmentation at CCS’21, and study the performance of the Chunk+PCFG preprocessing method when applied to LSTM and GAN. Extensive experiments on six large real-world password datasets show the effectiveness of our preprocessing methods. Results show that within 50 million guesses: 1) When we apply the PCFG preprocessing method to PassGAN (a GAN-based password model proposed by Hitja et al. at ACNS’19), 13.83%–38.81% (26.79% on average) more passwords can be cracked; 2) Our LSTM based model using PCFG for preprocessing (short for PL) outperforms Wang et al.’s original PL model by 0.35%–3.94% (1.36% on average). Overall, our preprocessing methods can improve the attacking rates in four over seven tested cases. We believe this work provides new feasible directions for guessing optimization, and contributes to a better understanding of deep-learning based models.
Author	Zou, Yunkai Wang, Ding Huang, Ziyi Wu, Yuxuan
Author_xml	– sequence: 1 givenname: Yuxuan surname: Wu fullname: Wu, Yuxuan – sequence: 2 givenname: Ding surname: Wang fullname: Wang, Ding email: wangding@nankai.edu.cn – sequence: 3 givenname: Yunkai surname: Zou fullname: Zou, Yunkai – sequence: 4 givenname: Ziyi surname: Huang fullname: Huang, Ziyi
BookMark	eNo1kMtSwzAMRQ0Uhrb0D1jkBwyy5cT2kmfpUIYuygw7TxqLZ0mCHeD3cVpYeXTleyWdERvUTU2MHQs4EQD61GrDkQMKLnKtNS-cgB02wqRshMddNhSFEBxR2T02Sf__e4UdsCEgSG61wgM2EqhkDhKsPmSTGN8AQGqUKMyQ3c4-2tB8v9bP2SVRm82pDHVfnZeRfLYoY_xpgs-mXxRjr981ntYxe9gUi0A82att74jtP5XrSJO_d8yW11fLixs-v5_OLs7mvJUKOy4Ko1BpQ9L7woMpgPrdDCiUyq8QbSVKk1Oea1uQQUXaKr3yVBpvQeCYyW1sbEOaSsGtmuY9JkCuR-dSlEOXYLgNKdejSya1NaV1P9MtnaPeVVHdhXJdvZRtRyE6DVrLlCGKFKYt_gI4qm04
ContentType	Book Chapter
Copyright	Springer Nature Switzerland AG 2022
Copyright_xml	– notice: Springer Nature Switzerland AG 2022
DBID	FFUUA
DEWEY	005.8
DOI	10.1007/978-3-031-15777-6_10
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	303115777X 9783031157776
EISSN	1611-3349
Editor	Li, Shujun Alcaraz, Cristina Chen, Liqun Samarati, Pierangela
Editor_xml	– sequence: 1 fullname: Chen, Liqun – sequence: 2 fullname: Li, Shujun – sequence: 3 fullname: Alcaraz, Cristina – sequence: 4 fullname: Samarati, Pierangela
EndPage	183
ExternalDocumentID	EBC7077207_160_179
GroupedDBID	38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z7Z Z81 Z83 Z84 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02
ID	FETCH-LOGICAL-p243t-16843478e2dd6d0860e2097804324db339c1a85e55796e834e7947bdea8d9013
ISBN	9783031157769 3031157761
ISSN	0302-9743
IngestDate	Tue Jul 29 20:12:40 EDT 2025 Fri Aug 15 21:41:29 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	QA76.9.D35
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p243t-16843478e2dd6d0860e2097804324db339c1a85e55796e834e7947bdea8d9013
OCLC	1342502097
PQID	EBC7077207_160_179
PageCount	21
ParticipantIDs	springer_books_10_1007_978_3_031_15777_6_10 proquest_ebookcentralchapters_7077207_160_179
PublicationCentury	2000
PublicationDate	2022
PublicationDateYYYYMMDD	2022-01-01
PublicationDate_xml	– year: 2022 text: 2022
PublicationDecade	2020
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	24th International Conference, ICICS 2022, Canterbury, UK, September 5-8, 2022, Proceedings
PublicationTitle	Information and Communications Security
PublicationYear	2022
Publisher	Springer International Publishing AG Springer International Publishing
Publisher_xml	– name: Springer International Publishing AG – name: Springer International Publishing
RelatedPersons	Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti
RelatedPersons_xml	– sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti
SSID	ssj0002732318 ssj0002792
Score	2.0371034
Snippet	Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to...
SourceID	springer proquest
SourceType	Publisher
StartPage	163
SubjectTerms	Deep learning Generative Adversarial Networks Long Short-Term Memory neural networks Password Preprocessing
Title	Improving Deep Learning Based Password Guessing Models Using Pre-processing
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7077207&ppg=179&c=UERG http://link.springer.com/10.1007/978-3-031-15777-6_10
Volume	13407
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELZKuSAOvMXykg_cKqMkTuLkwAHYRatlWXEosOJiOfF0VYHaapNIwK_gJzMT201T9rJcoiqyamc-azwznm-GsZeLNJM2kVbUUKCDogCEwa0jKnTESoMWgImJKPzxLD_-nJ6cZ-eTyZ-drKWurV7Vv6_klfwPqvgOcSWW7DWQ3f4pvsDfiC8-EWF87hm_4zCrTxfcEg_7G4AR1aPxgfR2iJh3vbbtfnbDfvjqg8WH4fyiCPLaD1x9N8sBdD_y2_LXcneXDTGJQ4BNqNZ6MXuLh6NF87ShHESLG5GybfE9tV770cxcosKnSxAbR1QICyCxQfP61N9snK3bPmFsFppPBF20G6xIkr1gRQhW7oU7h4jbyLvF05VKASnXyyWwvFCDow_klCI4pZ1TKUbpSp96RRx7tenO9Ng1y_nnuNjNEMHJBM2mRK6Js3dDFdmU3XxzdHL6ZRu1Q2MvkcM9VUTlF909lVsVsYfCqmNX32n4ih3m5lVTjnycvWv53tqZ32W3iQHDiZqC8rvHJrC6z-4ECLiH4AH7sEWfE_o8oM979HlAnwf0uUOf9-jzMfoP2fz90fzdsfDNOcQmSWUr4rxIZaoKSKgnGTrGESRRX84KTXRbSVnWsSkyyIjsDIVMATW_qiyYwqINKh-x6Wq9gseMGzx0Spsvqqi2aW4BDXajFkkJdZGbJDIHTAS56D6DwKct104KjVYRuoiR0nGODq0qD9gsCE_T8EaH0ty4PC01Sl33Utck9SfXGv2U3Rq29TM2bS87eI5WaVu98FvlL89whME
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Information+and+Communications+Security&rft.au=Wu%2C+Yuxuan&rft.au=Wang%2C+Ding&rft.au=Zou%2C+Yunkai&rft.au=Huang%2C+Ziyi&rft.atitle=Improving+Deep+Learning+Based+Password+Guessing+Models+Using+Pre-processing&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783031157769&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=163&rft.epage=183&rft_id=info:doi/10.1007%2F978-3-031-15777-6_10
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7077207-l.jpg