Can Mamba Always Enjoy the "Free Lunch"?
Transformers have been the cornerstone of current Large Language Models (LLMs); however, its linear growth in overhead during inference with respect to sequence length poses challenges for modeling long sequences. In this context, Mamba has gradually attracted attention due to its constant-level siz...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
04.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Transformers have been the cornerstone of current Large Language Models
(LLMs); however, its linear growth in overhead during inference with respect to
sequence length poses challenges for modeling long sequences. In this context,
Mamba has gradually attracted attention due to its constant-level size during
inference and existing empirical results have shown that it can perform
comparably to Transformers in sequence modeling while offering significant
savings. However, one may ask that, can Mamba always enjoy the ``free lunch"?
In this paper, we focus on analyzing the expressive ability of Mamba from a
theoretical standpoint. First, inspired by the connection between Mamba and
linear attention, we investigate potential shortcomings of the Mamba when
performing the COPY operation. Our results indicate that Mamba with constant
size may encounter bottlenecks when handling COPY, while it can achieve perfect
performance when the size scales linearly with sequence length. Based on this
observation, we analyze Mamba's ability to tackle DP problems when equipped
with Chain of Thought (CoT). Our findings suggest that to solve arbitrary DP
problems, the total cost of Mamba is comparable to standard and efficient
Transformers. However, similar to efficient Transformers, when facing DP
problems with favorable properties such as locality, Mamba can provide savings
in overhead. Our results contribute to a deeper understanding of Mamba. |
---|---|
AbstractList | Transformers have been the cornerstone of current Large Language Models
(LLMs); however, its linear growth in overhead during inference with respect to
sequence length poses challenges for modeling long sequences. In this context,
Mamba has gradually attracted attention due to its constant-level size during
inference and existing empirical results have shown that it can perform
comparably to Transformers in sequence modeling while offering significant
savings. However, one may ask that, can Mamba always enjoy the ``free lunch"?
In this paper, we focus on analyzing the expressive ability of Mamba from a
theoretical standpoint. First, inspired by the connection between Mamba and
linear attention, we investigate potential shortcomings of the Mamba when
performing the COPY operation. Our results indicate that Mamba with constant
size may encounter bottlenecks when handling COPY, while it can achieve perfect
performance when the size scales linearly with sequence length. Based on this
observation, we analyze Mamba's ability to tackle DP problems when equipped
with Chain of Thought (CoT). Our findings suggest that to solve arbitrary DP
problems, the total cost of Mamba is comparable to standard and efficient
Transformers. However, similar to efficient Transformers, when facing DP
problems with favorable properties such as locality, Mamba can provide savings
in overhead. Our results contribute to a deeper understanding of Mamba. |
Author | Li, Zhicong Ren, Ruifeng Liu, Yong |
Author_xml | – sequence: 1 givenname: Ruifeng surname: Ren fullname: Ren, Ruifeng – sequence: 2 givenname: Zhicong surname: Li fullname: Li, Zhicong – sequence: 3 givenname: Yong surname: Liu fullname: Liu, Yong |
BackLink | https://doi.org/10.48550/arXiv.2410.03810$$DView paper in arXiv |
BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIBChgYWxgacDJoOCfmKfgm5iYlKjjmlCdWFiu45mXlVyqUZKQqKLkVpaYq-JTmJWco2fMwsKYl5hSn8kJpbgZ5N9cQZw9dsJnxBUWZuYlFlfEgs-PBZhsTVgEA9XstuQ |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2410.03810 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2410_03810 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2410_038103 |
IEDL.DBID | GOX |
IngestDate | Wed Oct 09 12:37:20 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2410_038103 |
OpenAccessLink | https://arxiv.org/abs/2410.03810 |
ParticipantIDs | arxiv_primary_2410_03810 |
PublicationCentury | 2000 |
PublicationDate | 2024-10-04 |
PublicationDateYYYYMMDD | 2024-10-04 |
PublicationDate_xml | – month: 10 year: 2024 text: 2024-10-04 day: 04 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 3.8806045 |
SecondaryResourceType | preprint |
Snippet | Transformers have been the cornerstone of current Large Language Models
(LLMs); however, its linear growth in overhead during inference with respect to... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning |
Title | Can Mamba Always Enjoy the "Free Lunch"? |
URI | https://arxiv.org/abs/2410.03810 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQSbRMSUoxM0_UNQdtzzVJSTPTtUxNS9Y1SLZISU1MA22HBHUUff3MPEJNvCJMI5gYFGB7YRKLKjLLIOcDJxXrA6sXAz3QXBawU85sZARasuXuHwGZnAQfxQVVj1AHbGOChZAqCTdBBn5o607BERIdQgxMqXkiDBrOiXkKvom5SYkKjjnliZXFCq55WfmVCsCml4KSW1FqqoIPsHLJULIXZZB3cw1x9tAFmx1fADkIIh5kbTzYWmMxBhZgdz1VgkHB1MzYzAJYiRqYJJqamKekWqSBaodU4zTTpNSU5JQkSQYJXKZI4ZaSZuAyAlan4GVkJjIMLCVFpamywOqwJEkOHCYAnnRiCg |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Can+Mamba+Always+Enjoy+the+%22Free+Lunch%22%3F&rft.au=Ren%2C+Ruifeng&rft.au=Li%2C+Zhicong&rft.au=Liu%2C+Yong&rft.date=2024-10-04&rft_id=info:doi/10.48550%2Farxiv.2410.03810&rft.externalDocID=2410_03810 |