Logo Toward General Instruction-Following Alignment for Retrieval-Augmented Generation
Guanting Dong1, Xiaoshuai Song2, Yutao Zhu1, Runqi Qiao2, Zhicheng Dou1*, Ji-Rong Wen1
1Gaoling School of Artificial Intelligence, Renmin University of China.
2School of Artificial Intelligence, Beijing University of Posts and Telecommunications. *Corresponding Author.

Introduction

Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. Despite recent advancements in Large Language Models (LLMs), research on assessing and improving instruction-following (IF) alignment within the RAG domain remains limited.

To address this issue, we propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems. We start by manually crafting a minimal set of atomic instructions (<100) and developing combination rules to synthesize and verify complex instructions for a seed set. We then use supervised models for instruction rewriting while simultaneously generating code to automate the verification of instruction quality via a Python executor. Finally, we integrate these instructions with extensive RAG and general data samples, scaling up to a high-quality VIF-RAG-QA dataset (>100k) through automated processes.

To address the gap in instruction-following auto-evaluation for RAG systems, we introduce FollowRAG Benchmark, which includes approximately 3K test samples, covering 22 categories of general instruction constraints and 4 knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG can seamlessly integrate with different RAG benchmarks.

Using FollowRAG and 8 widely-used IF and foundational abilities benchmarks for LLMs, we demonstrate that VIF-RAG markedly enhances LLM performance across a broad range of general instruction constraints while effectively leveraging its capabilities in RAG scenarios. Further analysis offers practical insights for achieving IF alignment in RAG systems.

VIF-RAG Framework

Overview

It is the first automated, scalable, and verifiable data synthesis pipeline for aligning complex instruction-following in RAG scenarios. VIF-RAG integrates a verification process at each step of data augmentation and combination. We begin by manually creating a minimal set of atomic instructions (<100) and then apply steps including instruction composition, quality verification, instruction-query combination, and dual-stage verification to generate a large-scale, high-quality VIF-RAG-QA dataset (>100K).

FollowRAG Benchmark

Overview

To address the gap in instruction-following auto-evaluation for RAG systems, we introduce FollowRAG Benchmark, which includes approximately 3K test samples, covering 22 categories of general instruction constraints and 4 knowledge-intensive QA datasets. Due to its robust pipeline design, FollowRAG can seamlessly integrate with different RAG benchmarks




>

Experiment Results


Case Study

BibTeX


    @article{dong2024general,
  author       = {Guanting Dong and
                  Xiaoshuai Song and
                  Yutao Zhu and
                  Runqi Qiao and
                  Zhicheng Dou and
                  Ji{-}Rong Wen},
  title        = {Toward General Instruction-Following Alignment for Retrieval-Augmented
                  Generation},
  journal      = {CoRR},
  volume       = {abs/2410.09584},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2410.09584},
  doi          = {10.48550/ARXIV.2410.09584},
  eprinttype    = {arXiv},
  eprint       = {2410.09584},
  timestamp    = {Fri, 22 Nov 2024 21:38:25 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2410-09584.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}