Recurrent neural network backdoor attack detection method based on interpretable model

The invention discloses a recurrent neural network backdoor attack detection method based on an interpretable model, which includes: abstracting an RNN model in three steps and performing backdoor detection on a text: firstly, clustering RNN hidden layer vectors by using a machine learning algorithm...

Full description

Saved in:
Bibliographic Details
Main Authors SI ZILIANG, LIU TING, FAN MING, WEI WENYING, WEI JIALI
Format Patent
LanguageChinese
English
Published 25.12.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a recurrent neural network backdoor attack detection method based on an interpretable model, which includes: abstracting an RNN model in three steps and performing backdoor detection on a text: firstly, clustering RNN hidden layer vectors by using a machine learning algorithm to construct an uncertain finite automaton; secondly, acquiring a state transition path of the textaccording to the constructed uncertain finite automaton so as to calculate the weight of each word in the text; and finally, detecting the backdoor in the text based on the thought of variation test.Through the method, the decision of the RNN on the text can be explained accurately, and the backdoor text can be detected accurately. 本发明公开一种基于可解释模型的循环神经网络后门攻击检测方法,分三步对RNN模型进行抽象并对文本进行后门检测:首先使用机器学习算法对RNN隐藏层向量进行聚类,构建不确定有穷自动机;其次根据构建的不确定有穷自动机,获取文本的状态转移路径,从而计算文本中每个单词的权重;最后基于变异测试的思想,对文本中的后门进行检测。通过以上方法,可以准确地对RNN在文本上的决策做出解释,并准确检测出后门文本。
Bibliography:Application Number: CN202010936181