PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification

Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to eff...

Full description

Saved in:
Bibliographic Details
Published inMachine learning with applications Vol. 21; p. 100724
Main Authors Fulkerson, Eli, Yocam, Eric, Vaidyan, Varghese, Kamepalli, Mahesh, Wang, Yong, Comert, Gurcan
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2025
Subjects
Online AccessGet full text
ISSN2666-8270
2666-8270
DOI10.1016/j.mlwa.2025.100724

Cover

Loading…
Abstract Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization. •Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network.
AbstractList Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization. •Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network.
ArticleNumber 100724
Author Fulkerson, Eli
Kamepalli, Mahesh
Vaidyan, Varghese
Wang, Yong
Yocam, Eric
Comert, Gurcan
Author_xml – sequence: 1
  givenname: Eli
  orcidid: 0000-0001-6064-0868
  surname: Fulkerson
  fullname: Fulkerson, Eli
  email: Eli.Fulkerson@trojans.dsu.edu
  organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA
– sequence: 2
  givenname: Eric
  surname: Yocam
  fullname: Yocam, Eric
  email: Eric.Yocam@trojans.dsu.edu
  organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA
– sequence: 3
  givenname: Varghese
  surname: Vaidyan
  fullname: Vaidyan, Varghese
  email: Varghese.Vaidyan@dsu.edu
  organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA
– sequence: 4
  givenname: Mahesh
  surname: Kamepalli
  fullname: Kamepalli, Mahesh
  email: Mahesh.Kamepalli@trojans.dsu.edu
  organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA
– sequence: 5
  givenname: Yong
  surname: Wang
  fullname: Wang, Yong
  email: Yong.Wang@dsu.edu
  organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA
– sequence: 6
  givenname: Gurcan
  surname: Comert
  fullname: Comert, Gurcan
  email: gcomert@ncat.edu
  organization: North Carolina A&T State University, 1601 E. Market Street, 27411 Greensboro, NC, USA
BookMark eNp9kM9KAzEQh4MoWGtfwFNeYGuSTbOreJGiVhEq_jmH2WRWp-5uSrJaii9vaz14EgZmGH7fMHxHbL8LHTJ2IsVYCmlOF-O2WcFYCTXZLESh9B4bKGNMVqpC7P-ZD9kopYUQQpVS5rkesK-H9eNsPjvnl7zFHrIGIXbUvXLooFknSryO0OIqxHdeh8g99hhb-on0b8ipXYLreai5C-2SGugpdHxTLTTkKHwkfgef8OQiLXtOHrueanI_sWN2UEOTcPTbh-zl-up5Osvu5ze308v7zEktdIYTU-uykpUs9aSC3JiiLFRuVDVRZe41QgHCeO88ovZVflaUArTXyvgCvMN8yNTuroshpYi1XUZqIa6tFHZr0C7s1qDdGrQ7gxvoYgfh5rNPwmiTI-wceoroeusD_Yd_Az1gfdw
Cites_doi 10.1016/j.asoc.2023.110138
10.3390/fi14080217
10.1109/ACCESS.2018.2874098
ContentType Journal Article
Copyright 2025 The Authors
Copyright_xml – notice: 2025 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.mlwa.2025.100724
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
EISSN 2666-8270
ExternalDocumentID 10_1016_j_mlwa_2025_100724
S2666827025001070
GroupedDBID 0R~
6I.
AAEDW
AAFTH
AALRI
AAXUO
AAYWO
ACVFH
ADCNI
ADVLN
AEUPX
AEXQZ
AFJKZ
AFPUW
AIGII
AITUG
AKBMS
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
APXCP
EBS
FDB
GROUPED_DOAJ
M~E
OK1
AAYXX
CITATION
ID FETCH-LOGICAL-c1404-e56f48b1b1845ba3667872362b5283d4ea7a06ddcdee4db39780a4d426d7adce3
ISSN 2666-8270
IngestDate Wed Sep 03 16:38:40 EDT 2025
Sat Sep 06 17:16:35 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords V8
Compilation
LSTM
PyTorch
Obfuscation
JavaScript
Bytecode
Malicious code detection
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1404-e56f48b1b1845ba3667872362b5283d4ea7a06ddcdee4db39780a4d426d7adce3
ORCID 0000-0001-6064-0868
OpenAccessLink http://dx.doi.org/10.1016/j.mlwa.2025.100724
ParticipantIDs crossref_primary_10_1016_j_mlwa_2025_100724
elsevier_sciencedirect_doi_10_1016_j_mlwa_2025_100724
PublicationCentury 2000
PublicationDate September 2025
2025-09-00
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: September 2025
PublicationDecade 2020
PublicationTitle Machine learning with applications
PublicationYear 2025
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References JavaScript obfuscator (b12) 2023
Herrera (b10) 2020
V8 JavaScript compiler (b20) 2023
Ren, Qiang, Wu, Zhou, Zou, Jin (b16) 2023
Ishida, Kaneko, Sumi (b11) 2023; 137
Fass, Backes, Stock (b5) 2019
Hajarnis, Dalal, Bawale, Abraham, Matange (b9) 2021
GeeksOnSecurity (b7) 2017
GeeksOnSecurity (b8) 2020
Qin, Wang, Chen, Song, Zhang (b15) 2023
Moog, Demmel, Backes, Fass (b13) 2021
Rozi, Kim, Ozawa (b19) 2020
Alazab, Khraisat, Alazab, Singh (b1) 2022; 14
Curtsinger, Livshits, Zorn, Seifert (b3) 2011
Fang, Huang, Liu, Xue (b4) 2018; 6
Fass, Krawczyk, Backes, Stock (b6) 2018
Ren, Qiang, Wu, Zhou, Zou, Jin (b17) 2023
Rieck, Krueger, Dewald (b18) 2010
Blanc, Miyamoto, Akiyama, Kadobayashi (b2) 2012
Petrak (b14) 2024
Fass (10.1016/j.mlwa.2025.100724_b5) 2019
Ren (10.1016/j.mlwa.2025.100724_b16) 2023
Rozi (10.1016/j.mlwa.2025.100724_b19) 2020
Ishida (10.1016/j.mlwa.2025.100724_b11) 2023; 137
Alazab (10.1016/j.mlwa.2025.100724_b1) 2022; 14
Fang (10.1016/j.mlwa.2025.100724_b4) 2018; 6
Ren (10.1016/j.mlwa.2025.100724_b17) 2023
V8 JavaScript compiler (10.1016/j.mlwa.2025.100724_b20) 2023
GeeksOnSecurity (10.1016/j.mlwa.2025.100724_b8) 2020
Hajarnis (10.1016/j.mlwa.2025.100724_b9) 2021
Qin (10.1016/j.mlwa.2025.100724_b15) 2023
GeeksOnSecurity (10.1016/j.mlwa.2025.100724_b7) 2017
Petrak (10.1016/j.mlwa.2025.100724_b14) 2024
Moog (10.1016/j.mlwa.2025.100724_b13) 2021
JavaScript obfuscator (10.1016/j.mlwa.2025.100724_b12) 2023
Rieck (10.1016/j.mlwa.2025.100724_b18) 2010
Curtsinger (10.1016/j.mlwa.2025.100724_b3) 2011
Fass (10.1016/j.mlwa.2025.100724_b6) 2018
Blanc (10.1016/j.mlwa.2025.100724_b2) 2012
Herrera (10.1016/j.mlwa.2025.100724_b10) 2020
References_xml – start-page: 327
  year: 2023
  end-page: 338
  ident: b15
  article-title: TransAST: A machine translation-based approach for obfuscated malicious JavaScript detection
  publication-title: 2023 53rd annual IEEE/iFIP international conference on dependable systems and networks
– start-page: 1
  year: 2021
  end-page: 7
  ident: b9
  article-title: A comprehensive solution for obfuscation detection and removal based on comparative analysis of deobfuscation tools
  publication-title: 2021 international conference on smart generation computing, communication and networking (SMART GENCON)
– year: 2020
  ident: b8
  article-title: geeksonsecurity/illuminatejs. GitHub
– start-page: 31
  year: 2010
  end-page: 39
  ident: b18
  article-title: Cujo: efficient detection and prevention of drive-by-download attacks
  publication-title: Proceedings of the 26th annual computer security applications conference
– volume: 14
  year: 2022
  ident: b1
  article-title: Detection of obfuscated malicious JavaScript code
  publication-title: Future Internet
– start-page: 569
  year: 2021
  end-page: 580
  ident: b13
  article-title: Statically detecting JavaScript obfuscation and minification techniques in the wild
  publication-title: 2021 51st annual IEEE/iFIP international conference on dependable systems and networks
– start-page: 344
  year: 2012
  end-page: 351
  ident: b2
  article-title: Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts
  publication-title: 2012 26th international conference on advanced information networking and applications workshops
– year: 2011
  ident: b3
  article-title: ZOZZLE: Fast and precise In-Browser JavaScript malware detection
  publication-title: 20th USeNIX security symposium (USeNIX security 11)
– volume: 6
  start-page: 59118
  year: 2018
  end-page: 59125
  ident: b4
  article-title: Research on malicious JavaScript detection technology based on LSTM
  publication-title: IEEE Access
– year: 2023
  ident: b20
  article-title: V8 JavaScript compiler
– start-page: 257
  year: 2019
  end-page: 269
  ident: b5
  article-title: JStap: a static pre-filter for malicious JavaScript detection
  publication-title: Proceedings of the 35th annual computer security applications conference
– year: 2017
  ident: b7
  article-title: js-malicious-dataset. GitHub
– year: 2023
  ident: b12
  article-title: JavaScript obfuscator
– year: 2024
  ident: b14
  article-title: JavaScript Malware Collection. GitHub
– year: 2023
  ident: b16
  article-title: Artifacts for the Issta 2023 paper: An empirical study on the effects of obfuscation on static machine learning-based malicious JavaScript Detectors
– start-page: 339
  year: 2023
  end-page: 351
  ident: b17
  article-title: JSRevealer: A robust malicious JavaScript detector against obfuscation
  publication-title: 2023 53rd annual IEEE/iFIP international conference on dependable systems and networks
– start-page: 1
  year: 2020
  end-page: 8
  ident: b19
  article-title: Deep neural networks for malicious JavaScript detection using bytecode sequences
  publication-title: 2020 international joint conference on neural networks
– start-page: 303
  year: 2018
  end-page: 325
  ident: b6
  article-title: JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript
  publication-title: Detection of intrusions and malware, and vulnerability assessment
– year: 2020
  ident: b10
  article-title: Optimizing away JavaScript obfuscation
– volume: 137
  year: 2023
  ident: b11
  article-title: MOJI: Character-level convolutional neural networks for malicious obfuscated JavaScript inspection
  publication-title: Applied Soft Computing
– year: 2011
  ident: 10.1016/j.mlwa.2025.100724_b3
  article-title: ZOZZLE: Fast and precise In-Browser JavaScript malware detection
– volume: 137
  year: 2023
  ident: 10.1016/j.mlwa.2025.100724_b11
  article-title: MOJI: Character-level convolutional neural networks for malicious obfuscated JavaScript inspection
  publication-title: Applied Soft Computing
  doi: 10.1016/j.asoc.2023.110138
– year: 2024
  ident: 10.1016/j.mlwa.2025.100724_b14
– start-page: 569
  year: 2021
  ident: 10.1016/j.mlwa.2025.100724_b13
  article-title: Statically detecting JavaScript obfuscation and minification techniques in the wild
– start-page: 31
  year: 2010
  ident: 10.1016/j.mlwa.2025.100724_b18
  article-title: Cujo: efficient detection and prevention of drive-by-download attacks
– start-page: 257
  year: 2019
  ident: 10.1016/j.mlwa.2025.100724_b5
  article-title: JStap: a static pre-filter for malicious JavaScript detection
– start-page: 344
  year: 2012
  ident: 10.1016/j.mlwa.2025.100724_b2
  article-title: Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts
– start-page: 1
  year: 2021
  ident: 10.1016/j.mlwa.2025.100724_b9
  article-title: A comprehensive solution for obfuscation detection and removal based on comparative analysis of deobfuscation tools
– volume: 14
  issue: 8
  year: 2022
  ident: 10.1016/j.mlwa.2025.100724_b1
  article-title: Detection of obfuscated malicious JavaScript code
  publication-title: Future Internet
  doi: 10.3390/fi14080217
– start-page: 327
  year: 2023
  ident: 10.1016/j.mlwa.2025.100724_b15
  article-title: TransAST: A machine translation-based approach for obfuscated malicious JavaScript detection
– volume: 6
  start-page: 59118
  year: 2018
  ident: 10.1016/j.mlwa.2025.100724_b4
  article-title: Research on malicious JavaScript detection technology based on LSTM
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2874098
– start-page: 1
  year: 2020
  ident: 10.1016/j.mlwa.2025.100724_b19
  article-title: Deep neural networks for malicious JavaScript detection using bytecode sequences
– start-page: 339
  year: 2023
  ident: 10.1016/j.mlwa.2025.100724_b17
  article-title: JSRevealer: A robust malicious JavaScript detector against obfuscation
– start-page: 303
  year: 2018
  ident: 10.1016/j.mlwa.2025.100724_b6
  article-title: JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript
– year: 2020
  ident: 10.1016/j.mlwa.2025.100724_b8
– year: 2023
  ident: 10.1016/j.mlwa.2025.100724_b16
– year: 2023
  ident: 10.1016/j.mlwa.2025.100724_b12
– year: 2023
  ident: 10.1016/j.mlwa.2025.100724_b20
– year: 2017
  ident: 10.1016/j.mlwa.2025.100724_b7
– year: 2020
  ident: 10.1016/j.mlwa.2025.100724_b10
SSID ssj0002811334
Score 2.3030524
Snippet Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection....
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 100724
SubjectTerms Bytecode
Compilation
JavaScript
LSTM
Malicious code detection
Obfuscation
PyTorch
Title PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification
URI https://dx.doi.org/10.1016/j.mlwa.2025.100724
Volume 21
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bT8IwFG4QX3wxGjVeSR98IyOwu74RoyEY1HghvC29oegAY0CDJv4ef6ana7sNRKMmZCEFum7n4-zr6XdOEdrnwNqo3a1abkgdy7UFswhjYJDggHIYOXOS9OjWmd-4cZsdr1MofORUS-MRrbDXuXkl_7EqtIFdZZbsHyybdgoN8B7sC0ewMBx_ZeOLyWXjvKGSy_tiRKzYBDqIqTXSNeKrRE_ItfjF5EhlOZJSWt6LNX8clPtAz1kij22SZ3KVuJZyj2tpUWZNsxdUIskU5fT8KmUutzie4mQcPyQkX2vKMq_DFDSlYzaNbdLjExWhbZOn2zuRrSSdwmU9klgld7cIfHSXD2DYXqrQ0n4OKIJvhbbaP6Qi5rRpR61SqbWnleoOlX395SGg4hH3lX78IitL2V4l-_J0xe2ZJ2GqTzTSt_tI9hHJPiLVxwJatGFCIjcJab1n0Tw7rMFkX0oY0pHrDC0lJpwdynwWlGM21ytoWU9JcF3haxUVxGANvSlsHeI6nkIWNsjCKbIwIAvnkIUBWVghCw-7OIcsDK8UWThDFp5G1jq6OTm-PmpYeqMOi8nqTJbw_C7812u0FroeJY4PDCiwgRpRWTqIu4IEpOpzzrgQLqeOrHpFXA7kkAfgDoSzgYqD4UBsIuwIj1cZhVmUL5eYecg9yoCEsyrMM6gjtlDZ3LjoUdVjib631hbyzL2NNKNUTDECrPzwu-0_nWUHLWW43kXF0dNY7AFVHdFSEuIpJVj5BAfJmh4
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PyRHOH%3A+A+meta-learning+analysis+framework+for+determining+the+impact+of+compilation+on+malicious+JavaScript+identification&rft.jtitle=Machine+learning+with+applications&rft.au=Fulkerson%2C+Eli&rft.au=Yocam%2C+Eric&rft.au=Vaidyan%2C+Varghese&rft.au=Kamepalli%2C+Mahesh&rft.date=2025-09-01&rft.issn=2666-8270&rft.eissn=2666-8270&rft.volume=21&rft.spage=100724&rft_id=info:doi/10.1016%2Fj.mlwa.2025.100724&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_mlwa_2025_100724
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-8270&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-8270&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-8270&client=summon