PyRHOH: A meta-learning analysis framework for determining the impact of compilation on malicious JavaScript identification
Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to eff...
Saved in:
Published in | Machine learning with applications Vol. 21; p. 100724 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.09.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2666-8270 2666-8270 |
DOI | 10.1016/j.mlwa.2025.100724 |
Cover
Loading…
Summary: | Automated identification of malicious JavaScript is a core problem within modern malware analysis. Code obfuscation is a common tactic used to evade detection. This obfuscation hinders both manual and automated detection methods, including neural network techniques. In order for these methods to effectively classify malware, it is beneficial to reduce the effects of obfuscation as well as to optimize the configuration and structure of the neural network to be well suited for the task. To overcome these challenges, we present a new framework: “PyRHOH” (“Python Repeatable Hyperparameter Optimization Harness”), a meta-learning framework that implements Bayesian optimization. The automated exploration and maximization of candidate hyperparameters using a Bayesian method adds structure and rigor to the selection of neural network hyperparameters, providing the assurance that an implemented design is optimal. In this study, we used the PyRHOH framework to determine optimal recurrent neural network architectures for the differentiation of malicious and benign JavaScript samples. We then used these neural networks to measure the degree to which compilation of raw JavaScript samples into bytecode via Google’s V8 JavaScript compiler affected classification accuracy. Classifying in-the-wild samples, compilation increased the detection rate from 76.88% to 95.84%. Among uniformly obfuscated samples, compilation increased the detection rate from an average of 76.76% to an average of 91.24% e compilation was performed. This shows that pre-processing JavaScript into compiled bytecode has a clear positive impact on neural network categorization.
•Impact of compilation on neural network identification of malicious JavaScript.•Impact of compilation to reverse negative effects of JavaScript obfuscation.•Framework using Bayesian optimization to generate provably optimized neural network. |
---|---|
ISSN: | 2666-8270 2666-8270 |
DOI: | 10.1016/j.mlwa.2025.100724 |