SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting...

Full description

Saved in:

Bibliographic Details
Main Authors	Lv, Changze, Li, Tianlong, Xu, Jianhan, Gu, Chenxi, Ling, Zixuan, Zhang, Cenyuan, Zheng, Xiaoqing, Huang, Xuanjing
Format	Journal Article
Language	English
Published	29.08.2023
Subjects	Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption. Our code is available at https://github.com/Lvchangze/SpikeBERT.
DOI:	10.48550/arxiv.2308.15122