PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification
Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to eff...
Saved in:
Published in | Machine learning with applications Vol. 21; p. 100724 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.09.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2666-8270 2666-8270 |
DOI | 10.1016/j.mlwa.2025.100724 |
Cover
Loading…
Abstract | Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.
•Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network. |
---|---|
AbstractList | Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.
•Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network. |
ArticleNumber | 100724 |
Author | Fulkerson, Eli Kamepalli, Mahesh Vaidyan, Varghese Wang, Yong Yocam, Eric Comert, Gurcan |
Author_xml | – sequence: 1 givenname: Eli orcidid: 0000-0001-6064-0868 surname: Fulkerson fullname: Fulkerson, Eli email: Eli.Fulkerson@trojans.dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 2 givenname: Eric surname: Yocam fullname: Yocam, Eric email: Eric.Yocam@trojans.dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 3 givenname: Varghese surname: Vaidyan fullname: Vaidyan, Varghese email: Varghese.Vaidyan@dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 4 givenname: Mahesh surname: Kamepalli fullname: Kamepalli, Mahesh email: Mahesh.Kamepalli@trojans.dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 5 givenname: Yong surname: Wang fullname: Wang, Yong email: Yong.Wang@dsu.edu organization: Dakota State University, The Beacom College of Computer & Cyber Sciences, 820 N Washington Ave, Madison, 57042, SD, USA – sequence: 6 givenname: Gurcan surname: Comert fullname: Comert, Gurcan email: gcomert@ncat.edu organization: North Carolina A&T State University, 1601 E. Market Street, 27411 Greensboro, NC, USA |
BookMark | eNp9kM9KAzEQh4MoWGtfwFNeYGuSTbOreJGiVhEq_jmH2WRWp-5uSrJaii9vaz14EgZmGH7fMHxHbL8LHTJ2IsVYCmlOF-O2WcFYCTXZLESh9B4bKGNMVqpC7P-ZD9kopYUQQpVS5rkesK-H9eNsPjvnl7zFHrIGIXbUvXLooFknSryO0OIqxHdeh8g99hhb-on0b8ipXYLreai5C-2SGugpdHxTLTTkKHwkfgef8OQiLXtOHrueanI_sWN2UEOTcPTbh-zl-up5Osvu5ze308v7zEktdIYTU-uykpUs9aSC3JiiLFRuVDVRZe41QgHCeO88ovZVflaUArTXyvgCvMN8yNTuroshpYi1XUZqIa6tFHZr0C7s1qDdGrQ7gxvoYgfh5rNPwmiTI-wceoroeusD_Yd_Az1gfdw |
Cites_doi | 10.1016/j.asoc.2023.110138 10.3390/fi14080217 10.1109/ACCESS.2018.2874098 |
ContentType | Journal Article |
Copyright | 2025 The Authors |
Copyright_xml | – notice: 2025 The Authors |
DBID | 6I. AAFTH AAYXX CITATION |
DOI | 10.1016/j.mlwa.2025.100724 |
DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2666-8270 |
ExternalDocumentID | 10_1016_j_mlwa_2025_100724 S2666827025001070 |
GroupedDBID | 0R~ 6I. AAEDW AAFTH AALRI AAXUO AAYWO ACVFH ADCNI ADVLN AEUPX AEXQZ AFJKZ AFPUW AIGII AITUG AKBMS AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP EBS FDB GROUPED_DOAJ M~E OK1 AAYXX CITATION |
ID | FETCH-LOGICAL-c1404-e56f48b1b1845ba3667872362b5283d4ea7a06ddcdee4db39780a4d426d7adce3 |
ISSN | 2666-8270 |
IngestDate | Wed Sep 03 16:38:40 EDT 2025 Sat Sep 06 17:16:35 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | V8 Compilation LSTM PyTorch Obfuscation JavaScript Bytecode Malicious code detection |
Language | English |
License | This is an open access article under the CC BY-NC-ND license. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c1404-e56f48b1b1845ba3667872362b5283d4ea7a06ddcdee4db39780a4d426d7adce3 |
ORCID | 0000-0001-6064-0868 |
OpenAccessLink | http://dx.doi.org/10.1016/j.mlwa.2025.100724 |
ParticipantIDs | crossref_primary_10_1016_j_mlwa_2025_100724 elsevier_sciencedirect_doi_10_1016_j_mlwa_2025_100724 |
PublicationCentury | 2000 |
PublicationDate | September 2025 2025-09-00 |
PublicationDateYYYYMMDD | 2025-09-01 |
PublicationDate_xml | – month: 09 year: 2025 text: September 2025 |
PublicationDecade | 2020 |
PublicationTitle | Machine learning with applications |
PublicationYear | 2025 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | JavaScript obfuscator (b12) 2023 Herrera (b10) 2020 V8 JavaScript compiler (b20) 2023 Ren, Qiang, Wu, Zhou, Zou, Jin (b16) 2023 Ishida, Kaneko, Sumi (b11) 2023; 137 Fass, Backes, Stock (b5) 2019 Hajarnis, Dalal, Bawale, Abraham, Matange (b9) 2021 GeeksOnSecurity (b7) 2017 GeeksOnSecurity (b8) 2020 Qin, Wang, Chen, Song, Zhang (b15) 2023 Moog, Demmel, Backes, Fass (b13) 2021 Rozi, Kim, Ozawa (b19) 2020 Alazab, Khraisat, Alazab, Singh (b1) 2022; 14 Curtsinger, Livshits, Zorn, Seifert (b3) 2011 Fang, Huang, Liu, Xue (b4) 2018; 6 Fass, Krawczyk, Backes, Stock (b6) 2018 Ren, Qiang, Wu, Zhou, Zou, Jin (b17) 2023 Rieck, Krueger, Dewald (b18) 2010 Blanc, Miyamoto, Akiyama, Kadobayashi (b2) 2012 Petrak (b14) 2024 Fass (10.1016/j.mlwa.2025.100724_b5) 2019 Ren (10.1016/j.mlwa.2025.100724_b16) 2023 Rozi (10.1016/j.mlwa.2025.100724_b19) 2020 Ishida (10.1016/j.mlwa.2025.100724_b11) 2023; 137 Alazab (10.1016/j.mlwa.2025.100724_b1) 2022; 14 Fang (10.1016/j.mlwa.2025.100724_b4) 2018; 6 Ren (10.1016/j.mlwa.2025.100724_b17) 2023 V8 JavaScript compiler (10.1016/j.mlwa.2025.100724_b20) 2023 GeeksOnSecurity (10.1016/j.mlwa.2025.100724_b8) 2020 Hajarnis (10.1016/j.mlwa.2025.100724_b9) 2021 Qin (10.1016/j.mlwa.2025.100724_b15) 2023 GeeksOnSecurity (10.1016/j.mlwa.2025.100724_b7) 2017 Petrak (10.1016/j.mlwa.2025.100724_b14) 2024 Moog (10.1016/j.mlwa.2025.100724_b13) 2021 JavaScript obfuscator (10.1016/j.mlwa.2025.100724_b12) 2023 Rieck (10.1016/j.mlwa.2025.100724_b18) 2010 Curtsinger (10.1016/j.mlwa.2025.100724_b3) 2011 Fass (10.1016/j.mlwa.2025.100724_b6) 2018 Blanc (10.1016/j.mlwa.2025.100724_b2) 2012 Herrera (10.1016/j.mlwa.2025.100724_b10) 2020 |
References_xml | – start-page: 327 year: 2023 end-page: 338 ident: b15 article-title: TransAST: A machine translation-based approach for obfuscated malicious JavaScript detection publication-title: 2023 53rd annual IEEE/iFIP international conference on dependable systems and networks – start-page: 1 year: 2021 end-page: 7 ident: b9 article-title: A comprehensive solution for obfuscation detection and removal based on comparative analysis of deobfuscation tools publication-title: 2021 international conference on smart generation computing, communication and networking (SMART GENCON) – year: 2020 ident: b8 article-title: geeksonsecurity/illuminatejs. GitHub – start-page: 31 year: 2010 end-page: 39 ident: b18 article-title: Cujo: efficient detection and prevention of drive-by-download attacks publication-title: Proceedings of the 26th annual computer security applications conference – volume: 14 year: 2022 ident: b1 article-title: Detection of obfuscated malicious JavaScript code publication-title: Future Internet – start-page: 569 year: 2021 end-page: 580 ident: b13 article-title: Statically detecting JavaScript obfuscation and minification techniques in the wild publication-title: 2021 51st annual IEEE/iFIP international conference on dependable systems and networks – start-page: 344 year: 2012 end-page: 351 ident: b2 article-title: Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts publication-title: 2012 26th international conference on advanced information networking and applications workshops – year: 2011 ident: b3 article-title: ZOZZLE: Fast and precise In-Browser JavaScript malware detection publication-title: 20th USeNIX security symposium (USeNIX security 11) – volume: 6 start-page: 59118 year: 2018 end-page: 59125 ident: b4 article-title: Research on malicious JavaScript detection technology based on LSTM publication-title: IEEE Access – year: 2023 ident: b20 article-title: V8 JavaScript compiler – start-page: 257 year: 2019 end-page: 269 ident: b5 article-title: JStap: a static pre-filter for malicious JavaScript detection publication-title: Proceedings of the 35th annual computer security applications conference – year: 2017 ident: b7 article-title: js-malicious-dataset. GitHub – year: 2023 ident: b12 article-title: JavaScript obfuscator – year: 2024 ident: b14 article-title: JavaScript Malware Collection. GitHub – year: 2023 ident: b16 article-title: Artifacts for the Issta 2023 paper: An empirical study on the effects of obfuscation on static machine learning-based malicious JavaScript Detectors – start-page: 339 year: 2023 end-page: 351 ident: b17 article-title: JSRevealer: A robust malicious JavaScript detector against obfuscation publication-title: 2023 53rd annual IEEE/iFIP international conference on dependable systems and networks – start-page: 1 year: 2020 end-page: 8 ident: b19 article-title: Deep neural networks for malicious JavaScript detection using bytecode sequences publication-title: 2020 international joint conference on neural networks – start-page: 303 year: 2018 end-page: 325 ident: b6 article-title: JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript publication-title: Detection of intrusions and malware, and vulnerability assessment – year: 2020 ident: b10 article-title: Optimizing away JavaScript obfuscation – volume: 137 year: 2023 ident: b11 article-title: MOJI: Character-level convolutional neural networks for malicious obfuscated JavaScript inspection publication-title: Applied Soft Computing – year: 2011 ident: 10.1016/j.mlwa.2025.100724_b3 article-title: ZOZZLE: Fast and precise In-Browser JavaScript malware detection – volume: 137 year: 2023 ident: 10.1016/j.mlwa.2025.100724_b11 article-title: MOJI: Character-level convolutional neural networks for malicious obfuscated JavaScript inspection publication-title: Applied Soft Computing doi: 10.1016/j.asoc.2023.110138 – year: 2024 ident: 10.1016/j.mlwa.2025.100724_b14 – start-page: 569 year: 2021 ident: 10.1016/j.mlwa.2025.100724_b13 article-title: Statically detecting JavaScript obfuscation and minification techniques in the wild – start-page: 31 year: 2010 ident: 10.1016/j.mlwa.2025.100724_b18 article-title: Cujo: efficient detection and prevention of drive-by-download attacks – start-page: 257 year: 2019 ident: 10.1016/j.mlwa.2025.100724_b5 article-title: JStap: a static pre-filter for malicious JavaScript detection – start-page: 344 year: 2012 ident: 10.1016/j.mlwa.2025.100724_b2 article-title: Characterizing obfuscated JavaScript using abstract syntax trees: Experimenting with malicious scripts – start-page: 1 year: 2021 ident: 10.1016/j.mlwa.2025.100724_b9 article-title: A comprehensive solution for obfuscation detection and removal based on comparative analysis of deobfuscation tools – volume: 14 issue: 8 year: 2022 ident: 10.1016/j.mlwa.2025.100724_b1 article-title: Detection of obfuscated malicious JavaScript code publication-title: Future Internet doi: 10.3390/fi14080217 – start-page: 327 year: 2023 ident: 10.1016/j.mlwa.2025.100724_b15 article-title: TransAST: A machine translation-based approach for obfuscated malicious JavaScript detection – volume: 6 start-page: 59118 year: 2018 ident: 10.1016/j.mlwa.2025.100724_b4 article-title: Research on malicious JavaScript detection technology based on LSTM publication-title: IEEE Access doi: 10.1109/ACCESS.2018.2874098 – start-page: 1 year: 2020 ident: 10.1016/j.mlwa.2025.100724_b19 article-title: Deep neural networks for malicious JavaScript detection using bytecode sequences – start-page: 339 year: 2023 ident: 10.1016/j.mlwa.2025.100724_b17 article-title: JSRevealer: A robust malicious JavaScript detector against obfuscation – start-page: 303 year: 2018 ident: 10.1016/j.mlwa.2025.100724_b6 article-title: JaSt: Fully syntactic detection of malicious (obfuscated) JavaScript – year: 2020 ident: 10.1016/j.mlwa.2025.100724_b8 – year: 2023 ident: 10.1016/j.mlwa.2025.100724_b16 – year: 2023 ident: 10.1016/j.mlwa.2025.100724_b12 – year: 2023 ident: 10.1016/j.mlwa.2025.100724_b20 – year: 2017 ident: 10.1016/j.mlwa.2025.100724_b7 – year: 2020 ident: 10.1016/j.mlwa.2025.100724_b10 |
SSID | ssj0002811334 |
Score | 2.3030524 |
Snippet | Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection.... |
SourceID | crossref elsevier |
SourceType | Index Database Publisher |
StartPage | 100724 |
SubjectTerms | Bytecode Compilation JavaScript LSTM Malicious code detection Obfuscation PyTorch |
Title | PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification |
URI | https://dx.doi.org/10.1016/j.mlwa.2025.100724 |
Volume | 21 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bT8IwFG4QX3wxGjVeSR98IyOwu74RoyEY1HghvC29oegAY0CDJv4ef6ana7sNRKMmZCEFum7n4-zr6XdOEdrnwNqo3a1abkgdy7UFswhjYJDggHIYOXOS9OjWmd-4cZsdr1MofORUS-MRrbDXuXkl_7EqtIFdZZbsHyybdgoN8B7sC0ewMBx_ZeOLyWXjvKGSy_tiRKzYBDqIqTXSNeKrRE_ItfjF5EhlOZJSWt6LNX8clPtAz1kij22SZ3KVuJZyj2tpUWZNsxdUIskU5fT8KmUutzie4mQcPyQkX2vKMq_DFDSlYzaNbdLjExWhbZOn2zuRrSSdwmU9klgld7cIfHSXD2DYXqrQ0n4OKIJvhbbaP6Qi5rRpR61SqbWnleoOlX395SGg4hH3lX78IitL2V4l-_J0xe2ZJ2GqTzTSt_tI9hHJPiLVxwJatGFCIjcJab1n0Tw7rMFkX0oY0pHrDC0lJpwdynwWlGM21ytoWU9JcF3haxUVxGANvSlsHeI6nkIWNsjCKbIwIAvnkIUBWVghCw-7OIcsDK8UWThDFp5G1jq6OTm-PmpYeqMOi8nqTJbw_C7812u0FroeJY4PDCiwgRpRWTqIu4IEpOpzzrgQLqeOrHpFXA7kkAfgDoSzgYqD4UBsIuwIj1cZhVmUL5eYecg9yoCEsyrMM6gjtlDZ3LjoUdVjib631hbyzL2NNKNUTDECrPzwu-0_nWUHLWW43kXF0dNY7AFVHdFSEuIpJVj5BAfJmh4 |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=PyRHOH%3A+A+meta-learning+analysis+framework+for+determining+the+impact+of+compilation+on+malicious+JavaScript+identification&rft.jtitle=Machine+learning+with+applications&rft.au=Fulkerson%2C+Eli&rft.au=Yocam%2C+Eric&rft.au=Vaidyan%2C+Varghese&rft.au=Kamepalli%2C+Mahesh&rft.date=2025-09-01&rft.issn=2666-8270&rft.eissn=2666-8270&rft.volume=21&rft.spage=100724&rft_id=info:doi/10.1016%2Fj.mlwa.2025.100724&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_mlwa_2025_100724 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2666-8270&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2666-8270&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2666-8270&client=summon |