PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification

Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to eff...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning with applications Vol. 21; p. 100724
Main Authors	Fulkerson, Eli, Yocam, Eric, Vaidyan, Varghese, Kamepalli, Mahesh, Wang, Yong, Comert, Gurcan
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.09.2025
Subjects	Bytecode Compilation JavaScript LSTM Malicious code detection Obfuscation PyTorch V8 Compilation LSTM PyTorch Obfuscation JavaScript Bytecode Malicious code detection
Online Access	Get full text
ISSN	2666-8270 2666-8270
DOI	10.1016/j.mlwa.2025.100724

Cover

Loading…

Abstract	Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization. •Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network.
AbstractList	Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization. •Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network.
ArticleNumber	100724
Author	Fulkerson, Eli Kamepalli, Mahesh Vaidyan, Varghese Wang, Yong Yocam, Eric Comert, Gurcan
Author_xml	– sequence: 1 givenname: Eli orcidid: 0000-0001-6064-0868 surname: Fulkerson fullname: Fulkerson, Eli email: Eli.Fulkerson@trojans.dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 2 givenname: Eric surname: Yocam fullname: Yocam, Eric email: Eric.Yocam@trojans.dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 3 givenname: Varghese surname: Vaidyan fullname: Vaidyan, Varghese email: Varghese.Vaidyan@dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 4 givenname: Mahesh surname: Kamepalli fullname: Kamepalli, Mahesh email: Mahesh.Kamepalli@trojans.dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 5 givenname: Yong surname: Wang fullname: Wang, Yong email: Yong.Wang@dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 6 givenname: Gurcan surname: Comert fullname: Comert, Gurcan email: gcomert@ncat.edu organization: North Carolina A&T State University, 1601 E. Market Street, 27411 Greensboro, NC, USA
BookMark	eNp9kM9KAzEQh4MoWGtfwFNeYGuSTbOreJGiVhEq_jmH2WRWp-5uSrJaii9vaz14EgZmGH7fMHxHbL8LHTJ2IsVYCmlOF-O2WcFYCTXZLESh9B4bKGNMVqpC7P-ZD9kopYUQQpVS5rkesK-H9eNsPjvnl7zFHrIGIXbUvXLooFknSryO0OIqxHdeh8g99hhb-on0b8ipXYLreai5C-2SGugpdHxTLTTkKHwkfgef8OQiLXtOHrueanI_sWN2UEOTcPTbh-zl-up5Osvu5ze308v7zEktdIYTU-uykpUs9aSC3JiiLFRuVDVRZe41QgHCeO88ovZVflaUArTXyvgCvMN8yNTuroshpYi1XUZqIa6tFHZr0C7s1qDdGrQ7gxvoYgfh5rNPwmiTI-wceoroeusD_Yd_Az1gfdw
Cites_doi	10.1016/j.asoc.2023.110138 10.3390/fi14080217 10.1109/ACCESS.2018.2874098
ContentType	Journal Article
Copyright	2025 The Authors
Copyright_xml	– notice: 2025 The Authors
DBID	6I. AAFTH AAYXX CITATION
DOI	10.1016/j.mlwa.2025.100724
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
EISSN	2666-8270
ExternalDocumentID	10_1016_j_mlwa_2025_100724 S2666827025001070
GroupedDBID	0R~ 6I. AAEDW AAFTH AALRI AAXUO AAYWO ACVFH ADCNI ADVLN AEUPX AEXQZ AFJKZ AFPUW AIGII AITUG AKBMS AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP EBS FDB GROUPED_DOAJ M~E OK1 AAYXX CITATION
ID	FETCH-LOGICAL-c1404-e56f48b1b1845ba3667872362b5283d4ea7a06ddcdee4db39780a4d426d7adce3
ISSN	2666-8270
IngestDate	Wed Sep 03 16:38:40 EDT 2025 Sat Sep 06 17:16:35 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	V8 Compilation LSTM PyTorch Obfuscation JavaScript Bytecode Malicious code detection
Language	English
License	This is an open access article under the CC BY-NC-ND license.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c1404-e56f48b1b1845ba3667872362b5283d4ea7a06ddcdee4db39780a4d426d7adce3
ORCID	0000-0001-6064-0868
OpenAccessLink	http://dx.doi.org/10.1016/j.mlwa.2025.100724
ParticipantIDs	crossref_primary_10_1016_j_mlwa_2025_100724 elsevier_sciencedirect_doi_10_1016_j_mlwa_2025_100724
PublicationCentury	2000
PublicationDate	September 2025 2025-09-00
PublicationDateYYYYMMDD	2025-09-01
PublicationDate_xml	– month: 09 year: 2025 text: September 2025
PublicationDecade	2020
PublicationTitle	Machine learning with applications
PublicationYear	2025
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	JavaScript obfuscator (b12) 2023 Herrera (b10) 2020 V8 JavaScript compiler (b20) 2023 Ren, Qiang, Wu, Zhou, Zou, Jin (b16) 2023 Ishida, Kaneko, Sumi (b11) 2023; 137 Fass, Backes, Stock (b5) 2019 Hajarnis, Dalal, Bawale, Abraham, Matange (b9) 2021 GeeksOnSecurity (b7) 2017 GeeksOnSecurity (b8) 2020 Qin, Wang, Chen, Song, Zhang (b15) 2023 Moog, Demmel, Backes, Fass (b13) 2021 Rozi, Kim, Ozawa (b19) 2020 Alazab, Khraisat, Alazab, Singh (b1) 2022; 14 Curtsinger, Livshits, Zorn, Seifert (b3) 2011 Fang, Huang, Liu, Xue (b4) 2018; 6 Fass, Krawczyk, Backes, Stock (b6) 2018 Ren, Qiang, Wu, Zhou, Zou, Jin (b17) 2023 Rieck, Krueger, Dewald (b18) 2010 Blanc, Miyamoto, Akiyama, Kadobayashi (b2) 2012 Petrak (b14) 2024 Fass (10.1016/j.mlwa.2025.100724_b5) 2019 Ren (10.1016/j.mlwa.2025.100724_b16) 2023 Rozi (10.1016/j.mlwa.2025.100724_b19) 2020 Ishida (10.1016/j.mlwa.2025.100724_b11) 2023; 137 Alazab (10.1016/j.mlwa.2025.100724_b1) 2022; 14 Fang (10.1016/j.mlwa.2025.100724_b4) 2018; 6 Ren (10.1016/j.mlwa.2025.100724_b17) 2023 V8 JavaScript compiler (10.1016/j.mlwa.2025.100724_b20) 2023 GeeksOnSecurity (10.1016/j.mlwa.2025.100724_b8) 2020 Hajarnis (10.1016/j.mlwa.2025.100724_b9) 2021 Qin (10.1016/j.mlwa.2025.100724_b15) 2023 GeeksOnSecurity (10.1016/j.mlwa.2025.100724_b7) 2017 Petrak (10.1016/j.mlwa.2025.100724_b14) 2024 Moog (10.1016/j.mlwa.2025.100724_b13) 2021 JavaScript obfuscator (10.1016/j.mlwa.2025.100724_b12) 2023 Rieck (10.1016/j.mlwa.2025.100724_b18) 2010 Curtsinger (10.1016/j.mlwa.2025.100724_b3) 2011 Fass (10.1016/j.mlwa.2025.100724_b6) 2018 Blanc (10.1016/j.mlwa.2025.100724_b2) 2012 Herrera (10.1016/j.mlwa.2025.100724_b10) 2020
References_xml	– start-page: 327 year: 2023 end-page: 338 ident: b15 article-title: TransAST: A machine translation-based approach for obfuscated malicious JavaScript detection publication-title: 2023 53rd annual IEEE/iFIP international conference on dependable systems and networks – start-page: 1 year: 2021 end-page: 7 ident: b9 article-title: A comprehensive solution for obfuscation detection and removal based on comparative analysis of deobfuscation tools publication-title: 2021 international conference on smart generation computing, communication and networking (SMART GENCON) – year: 2020 ident: b8 article-title: geeksonsecurity/illuminatejs. GitHub – start-page: 31 year: 2010 end-page: 39 ident: b18 article-title: Cujo: efficient detection and prevention of drive-by-download attacks publication-title: Proceedings of the 26th annual computer security applications conference – volume: 14 year: 2022 ident: b1 article-title: Detection of obfuscated malicious JavaScript code publication-title: Future Internet – start-page: 569 year: 2021 end-page: 580 ident: b13 article-title: Statically detecting JavaScript obfuscation and minification techniques in the wild publication-title: 2021 51st annual IEEE/iFIP international conference on dependable systems and networks – start-page: 344 year: 2012 end-page: 351 ident: b2 article-title: Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts publication-title: 2012 26th international conference on advanced information networking and applications workshops – year: 2011 ident: b3 article-title: ZOZZLE: Fast and precise In-Browser JavaScript malware detection publication-title: 20th USeNIX security symposium (USeNIX security 11) – volume: 6 start-page: 59118 year: 2018 end-page: 59125 ident: b4 article-title: Research on malicious JavaScript detection technology based on LSTM publication-title: IEEE Access – year: 2023 ident: b20 article-title: V8 JavaScript compiler – start-page: 257 year: 2019 end-page: 269 ident: b5 article-title: JStap: a static pre-filter for malicious JavaScript detection publication-title: Proceedings of the 35th annual computer security applications conference – year: 2017 ident: b7 article-title: js-malicious-dataset. GitHub – year: 2023 ident: b12 article-title: JavaScript obfuscator – year: 2024 ident: b14 article-title: JavaScript Malware Collection. GitHub – year: 2023 ident: b16 article-title: Artifacts for the Issta 2023 paper: An empirical study on the effects of obfuscation on static machine learning-based malicious JavaScript Detectors – start-page: 339 year: 2023 end-page: 351 ident: b17 article-title: JSRevealer: A robust malicious JavaScript detector against obfuscation publication-title: 2023 53rd annual IEEE/iFIP international conference on dependable systems and networks – start-page: 1 year: 2020 end-page: 8 ident: b19 article-title: Deep neural networks for malicious JavaScript detection using bytecode sequences publication-title: 2020 international joint conference on neural networks – start-page: 303 year: 2018 end-page: 325 ident: b6 article-title: JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript publication-title: Detection of intrusions and malware, and vulnerability assessment – year: 2020 ident: b10 article-title: Optimizing away JavaScript obfuscation – volume: 137 year: 2023 ident: b11 article-title: MOJI: Character-level convolutional neural networks for malicious obfuscated JavaScript inspection publication-title: Applied Soft Computing – year: 2011 ident: 10.1016/j.mlwa.2025.100724_b3 article-title: ZOZZLE: Fast and precise In-Browser JavaScript malware detection – volume: 137 year: 2023 ident: 10.1016/j.mlwa.2025.100724_b11 article-title: MOJI: Character-level convolutional neural networks for malicious obfuscated JavaScript inspection publication-title: Applied Soft Computing doi: 10.1016/j.asoc.2023.110138 – year: 2024 ident: 10.1016/j.mlwa.2025.100724_b14 – start-page: 569 year: 2021 ident: 10.1016/j.mlwa.2025.100724_b13 article-title: Statically detecting JavaScript obfuscation and minification techniques in the wild – start-page: 31 year: 2010 ident: 10.1016/j.mlwa.2025.100724_b18 article-title: Cujo: efficient detection and prevention of drive-by-download attacks – start-page: 257 year: 2019 ident: 10.1016/j.mlwa.2025.100724_b5 article-title: JStap: a static pre-filter for malicious JavaScript detection – start-page: 344 year: 2012 ident: 10.1016/j.mlwa.2025.100724_b2 article-title: Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts – start-page: 1 year: 2021 ident: 10.1016/j.mlwa.2025.100724_b9 article-title: A comprehensive solution for obfuscation detection and removal based on comparative analysis of deobfuscation tools – volume: 14 issue: 8 year: 2022 ident: 10.1016/j.mlwa.2025.100724_b1 article-title: Detection of obfuscated malicious JavaScript code publication-title: Future Internet doi: 10.3390/fi14080217 – start-page: 327 year: 2023 ident: 10.1016/j.mlwa.2025.100724_b15 article-title: TransAST: A machine translation-based approach for obfuscated malicious JavaScript detection – volume: 6 start-page: 59118 year: 2018 ident: 10.1016/j.mlwa.2025.100724_b4 article-title: Research on malicious JavaScript detection technology based on LSTM publication-title: IEEE Access doi: 10.1109/ACCESS.2018.2874098 – start-page: 1 year: 2020 ident: 10.1016/j.mlwa.2025.100724_b19 article-title: Deep neural networks for malicious JavaScript detection using bytecode sequences – start-page: 339 year: 2023 ident: 10.1016/j.mlwa.2025.100724_b17 article-title: JSRevealer: A robust malicious JavaScript detector against obfuscation – start-page: 303 year: 2018 ident: 10.1016/j.mlwa.2025.100724_b6 article-title: JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript – year: 2020 ident: 10.1016/j.mlwa.2025.100724_b8 – year: 2023 ident: 10.1016/j.mlwa.2025.100724_b16 – year: 2023 ident: 10.1016/j.mlwa.2025.100724_b12 – year: 2023 ident: 10.1016/j.mlwa.2025.100724_b20 – year: 2017 ident: 10.1016/j.mlwa.2025.100724_b7 – year: 2020 ident: 10.1016/j.mlwa.2025.100724_b10
SSID	ssj0002811334
Score	2.3030524
Snippet	Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection....
SourceID	crossref elsevier
SourceType	Index Database Publisher
StartPage	100724
SubjectTerms	Bytecode Compilation JavaScript LSTM Malicious code detection Obfuscation PyTorch
Title	PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification
URI	https://dx.doi.org/10.1016/j.mlwa.2025.100724
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bT8IwFG4QX3wxGjVeSR98IyOwu74RoyEY1HghvC29oegAY0CDJv4ef6ana7sNRKMmZCEFum7n4-zr6XdOEdrnwNqo3a1abkgdy7UFswhjYJDggHIYOXOS9OjWmd-4cZsdr1MofORUS-MRrbDXuXkl_7EqtIFdZZbsHyybdgoN8B7sC0ewMBx_ZeOLyWXjvKGSy_tiRKzYBDqIqTXSNeKrRE_ItfjF5EhlOZJSWt6LNX8clPtAz1kij22SZ3KVuJZyj2tpUWZNsxdUIskU5fT8KmUutzie4mQcPyQkX2vKMq_DFDSlYzaNbdLjExWhbZOn2zuRrSSdwmU9klgld7cIfHSXD2DYXqrQ0n4OKIJvhbbaP6Qi5rRpR61SqbWnleoOlX395SGg4hH3lX78IitL2V4l-_J0xe2ZJ2GqTzTSt_tI9hHJPiLVxwJatGFCIjcJab1n0Tw7rMFkX0oY0pHrDC0lJpwdynwWlGM21ytoWU9JcF3haxUVxGANvSlsHeI6nkIWNsjCKbIwIAvnkIUBWVghCw-7OIcsDK8UWThDFp5G1jq6OTm-PmpYeqMOi8nqTJbw_C7812u0FroeJY4PDCiwgRpRWTqIu4IEpOpzzrgQLqeOrHpFXA7kkAfgDoSzgYqD4UBsIuwIj1cZhVmUL5eYecg9yoCEsyrMM6gjtlDZ3LjoUdVjib631hbyzL2NNKNUTDECrPzwu-0_nWUHLWW43kXF0dNY7AFVHdFSEuIpJVj5BAfJmh4
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PyRHOH%3A+A+meta-learning+analysis+framework+for+determining+the+impact+of+compilation+on+malicious+JavaScript+identification&rft.jtitle=Machine+learning+with+applications&rft.au=Fulkerson%2C+Eli&rft.au=Yocam%2C+Eric&rft.au=Vaidyan%2C+Varghese&rft.au=Kamepalli%2C+Mahesh&rft.date=2025-09-01&rft.issn=2666-8270&rft.eissn=2666-8270&rft.volume=21&rft.spage=100724&rft_id=info:doi/10.1016%2Fj.mlwa.2025.100724&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_mlwa_2025_100724
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-8270&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-8270&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-8270&client=summon