Prompting Large Language Models for Malicious Webpage Detection

This work proposes a novel approach for malicious webpage detection by leveraging Large Language Models (LLMs). Unlike existing approaches that only analyze the Uniform Resource Locators (URLs) features, our approach considers the web contents for identifying malicious webpages. The major challenge...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML) pp. 393 - 400
Main Authors Li, Lu, Gong, Bojie
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This work proposes a novel approach for malicious webpage detection by leveraging Large Language Models (LLMs). Unlike existing approaches that only analyze the Uniform Resource Locators (URLs) features, our approach considers the web contents for identifying malicious webpages. The major challenge is the lack of large-scale malicious analysis datasets with crawled web content for training previous data-driven models. To mitigate the challenge, we investigate prompting LLMs for the malicious webpage detection task, thus breaking the constraint of annotated training data. By using the popular GPT-3.5 and ChatGPT as our LLM engines, we study zero-shot and few-shot prompting methods to adapt those LLMs to perform malicious webpage detection. Experimental results show that our proposed approach achieves comparable or even better performance than deep learning baselines. Our analysis highlights the importance of integrating webpage content in detecting malicious URLs and demonstrates the feasibility of using LLMs to detect cybersecurity threats.
DOI:10.1109/PRML59573.2023.10348229