RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with...

Full description

Saved in:
Bibliographic Details
Main Authors Zhang, Xuanwang, Song, Yunze, Wang, Yidong, Tang, Shuyun, Li, Xinfeng, Zeng, Zhengran, Wu, Zhen, Ye, Wei, Xu, Wenyuan, Zhang, Yue, Dai, Xinyu, Zhang, Shikun, Wen, Qingsong
Format Journal Article
LanguageEnglish
Published 21.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issues constrained the development of RAG. First, there is a growing lack of comprehensive and fair comparisons between novel RAG algorithms. Second, open-source tools such as LlamaIndex and LangChain employ high-level abstractions, which results in a lack of transparency and limits the ability to develop novel algorithms and evaluation metrics. To close this gap, we introduce RAGLAB, a modular and research-oriented open-source library. RAGLAB reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms. Leveraging RAGLAB, we conduct a fair comparison of 6 RAG algorithms across 10 benchmarks. With RAGLAB, researchers can efficiently compare the performance of various algorithms and develop novel algorithms.
AbstractList Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issues constrained the development of RAG. First, there is a growing lack of comprehensive and fair comparisons between novel RAG algorithms. Second, open-source tools such as LlamaIndex and LangChain employ high-level abstractions, which results in a lack of transparency and limits the ability to develop novel algorithms and evaluation metrics. To close this gap, we introduce RAGLAB, a modular and research-oriented open-source library. RAGLAB reproduces 6 existing algorithms and provides a comprehensive ecosystem for investigating RAG algorithms. Leveraging RAGLAB, we conduct a fair comparison of 6 RAG algorithms across 10 benchmarks. With RAGLAB, researchers can efficiently compare the performance of various algorithms and develop novel algorithms.
Author Wang, Yidong
Ye, Wei
Wu, Zhen
Wen, Qingsong
Xu, Wenyuan
Zhang, Shikun
Zhang, Yue
Zhang, Xuanwang
Li, Xinfeng
Tang, Shuyun
Dai, Xinyu
Song, Yunze
Zeng, Zhengran
Author_xml – sequence: 1
  givenname: Xuanwang
  surname: Zhang
  fullname: Zhang, Xuanwang
– sequence: 2
  givenname: Yunze
  surname: Song
  fullname: Song, Yunze
– sequence: 3
  givenname: Yidong
  surname: Wang
  fullname: Wang, Yidong
– sequence: 4
  givenname: Shuyun
  surname: Tang
  fullname: Tang, Shuyun
– sequence: 5
  givenname: Xinfeng
  surname: Li
  fullname: Li, Xinfeng
– sequence: 6
  givenname: Zhengran
  surname: Zeng
  fullname: Zeng, Zhengran
– sequence: 7
  givenname: Zhen
  surname: Wu
  fullname: Wu, Zhen
– sequence: 8
  givenname: Wei
  surname: Ye
  fullname: Ye, Wei
– sequence: 9
  givenname: Wenyuan
  surname: Xu
  fullname: Xu, Wenyuan
– sequence: 10
  givenname: Yue
  surname: Zhang
  fullname: Zhang, Yue
– sequence: 11
  givenname: Xinyu
  surname: Dai
  fullname: Dai, Xinyu
– sequence: 12
  givenname: Shikun
  surname: Zhang
  fullname: Zhang, Shikun
– sequence: 13
  givenname: Qingsong
  surname: Wen
  fullname: Wen, Qingsong
BackLink https://doi.org/10.48550/arXiv.2408.11381$$DView paper in arXiv
BookMark eNqFzrsOgjAUgOEOOnh7ACf7AiAVSIhbNYKDxoToZkJO5KCN0JrDRX178bI7_cs3_H3W0UYjY2Ph2F7g-84U6KEae-Y5gS2EG4geO8Yy2sjFnEu-NWmdA3HQKY-xRKDTxdqRQl1hyg9aZaptSFDg3dCVZ4ZaV7WggdyS9bn4ygg1ElTK6CHrZpCXOPp1wCbhar9cW5-P5EaqAHom75_k8-P-Fy_lrUG7
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2408.11381
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2408_11381
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2408_113813
IEDL.DBID GOX
IngestDate Wed Sep 11 12:28:31 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2408_113813
OpenAccessLink https://arxiv.org/abs/2408.11381
ParticipantIDs arxiv_primary_2408_11381
PublicationCentury 2000
PublicationDate 2024-08-21
PublicationDateYYYYMMDD 2024-08-21
PublicationDate_xml – month: 08
  year: 2024
  text: 2024-08-21
  day: 21
PublicationDecade 2020
PublicationYear 2024
Score 3.8574615
SecondaryResourceType preprint
Snippet Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Title RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation
URI https://arxiv.org/abs/2408.11381
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB1qT15EUanfc_AadNPNpustitsi1oIo7EFYku5UPKiltuLPd5LsopcekwxhmATmzWTmBeA8n-qZJpWJnPgYUienwloiHiqnlNJNHnL8kI2e07tSlR3AthfGLn7eviM_sPu68Pxb_tcR31u9IaUv2RpOyvg4Gai4Gvk_OcaYYeqfkyi2YatBd2jicexAhz524eXRDO_N9RUaHH_WvuwTOXzHtuZNTDzVMAM_ZPw3Y0SIRVswhYwoWc7_ecUXQpjV63uUjGTR3qZ7cFbcPt2MRNCnmkfyiMqrWgVV-_vQ5RCfeoBK65pSJS_rAaVZkrhayyzX1lrZp3xAB9Bbt8vh-qUj2JTsgn0GVCbH0F0uVnTCLnTpToMdfwGiw3WD
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=RAGLAB%3A+A+Modular+and+Research-Oriented+Unified+Framework+for+Retrieval-Augmented+Generation&rft.au=Zhang%2C+Xuanwang&rft.au=Song%2C+Yunze&rft.au=Wang%2C+Yidong&rft.au=Tang%2C+Shuyun&rft.date=2024-08-21&rft_id=info:doi/10.48550%2Farxiv.2408.11381&rft.externalDocID=2408_11381