PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification

Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to eff...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning with applications Vol. 21; p. 100724
Main Authors	Fulkerson, Eli, Yocam, Eric, Vaidyan, Varghese, Kamepalli, Mahesh, Wang, Yong, Comert, Gurcan
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.09.2025
Subjects	Bytecode Compilation JavaScript LSTM Malicious code detection Obfuscation PyTorch V8 Compilation LSTM PyTorch Obfuscation JavaScript Bytecode Malicious code detection
Online Access	Get full text
ISSN	2666-8270 2666-8270
DOI	10.1016/j.mlwa.2025.100724

Cover

Loading…

More Information
Summary:	Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization. •Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network.
ISSN:	2666-8270 2666-8270
DOI:	10.1016/j.mlwa.2025.100724