Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

AAAI 2026, Oral

Bo Li, Tian Tian, Zhenghua Xu, Hao Cheng, Shikun Zhang, Wei Ye

This repository releases the implementation of ETC (Entropy-Trend Constraint), a training-free method for timely retrieval in dynamic retrieval-augmented generation.

ETC addresses a central problem in dynamic RAG: when retrieval should be triggered. Instead of reacting to isolated token-level confidence drops, ETC models the trend of token-level uncertainty during generation. It uses the entropy sequence of generated tokens, its first- and second-order differences, and a dynamic smoothing mechanism to detect unstable decoding states earlier and more precisely.

Related RAG projects from us: GRIP (ACL 2026 Main) · ETC (AAAI 2026 Oral) · SCD (AAAI 2026 Oral)

🌟 Overview

Dynamic RAG improves large language models by retrieving external knowledge only when needed. However, many existing methods trigger retrieval only after the model has already produced low-confidence or incorrect tokens, which leads to delayed retrieval.

ETC is designed to solve this problem in a simple and practical way:

compute token-level entropy during decoding,
build the entropy sequence,
measure first-order and second-order entropy differences,
smooth the trend signal to reduce noisy triggers,
trigger retrieval when uncertainty is rising sharply.

This design allows ETC to inject knowledge at more appropriate positions while reducing unnecessary retrieval operations.

✨ Highlights

Training-free and plug-and-play
ETC does not require additional training or fine-tuning.
Trend-aware retrieval timing
ETC models how uncertainty evolves during decoding instead of relying on isolated token confidence.
Second-order entropy difference with dynamic smoothing
Retrieval is triggered by sharp changes in uncertainty while suppressing noisy outlier effects.
Better timing with fewer retrievals
ETC is designed to improve answer quality while reducing redundant retrieval frequency.

📦 What Is Released

This repository includes the following files:

main.py
Main entry for ETC inference.
retriever.py
Retrieval utilities for dynamic RAG.
prep_elastic.py
Builds the Elasticsearch index for Wikipedia passages.
generate.py
Generation-related utilities.
data.py
Dataset loading and preprocessing.
evaluate_.py
Evaluation script.
config.json
Example runtime configuration.
LICENSE
Repository license.

🗂️ Repository Structure

.
├── LICENSE
├── README.md
├── README_zh.md
├── config.json
├── data.py
├── evaluate_.py
├── generate.py
├── main.py
├── prep_elastic.py
└── retriever.py

⚙️ Installation

We recommend using Python 3.9.

conda create -n etc python=3.9
conda activate etc
pip install torch==2.1.1 transformers==4.30.2 beir==1.0.1
python -m spacy download en_core_web_sm

Depending on your environment, you may also need to install GPU-specific packages separately.

🧾 Prepare the Retriever

ETC uses a Wikipedia passage collection together with Elasticsearch.

Step 1. Download Wikipedia passages

mkdir -p data/dpr
wget -O data/dpr/psgs_w100.tsv.gz https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz
cd data/dpr
gzip -d psgs_w100.tsv.gz
cd ../..

Step 2. Install and start Elasticsearch

cd data
wget -O elasticsearch-7.17.9.tar.gz https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.9-linux-x86_64.tar.gz
tar zxvf elasticsearch-7.17.9.tar.gz
rm elasticsearch-7.17.9.tar.gz
cd elasticsearch-7.17.9
nohup bin/elasticsearch &
cd ../..

Step 3. Build the Wikipedia index

python prep_elastic.py --data_path data/dpr/psgs_w100.tsv --index_name wiki

🧪 Datasets

ETC is evaluated on six QA benchmarks:

2WikiMultihopQA
HotpotQA
StrategyQA
IIRC
BioASQ
PubMedQA

Download examples

2WikiMultihopQA

Download manually and place the extracted folder under:

data/2wikimultihopqa

Reference link:

https://www.dropbox.com/s/ms2m13252h6xubs/data_ids_april7.zip?e=1

StrategyQA

wget -O data/strategyqa_dataset.zip https://storage.googleapis.com/ai2i/strategyqa/data/strategyqa_dataset.zip
mkdir -p data/strategyqa
unzip data/strategyqa_dataset.zip -d data/strategyqa
rm data/strategyqa_dataset.zip

HotpotQA

mkdir -p data/hotpotqa
wget -O data/hotpotqa/hotpotqa-dev.json http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

IIRC

wget -O data/iirc.tgz https://iirc-dataset.s3.us-west-2.amazonaws.com/iirc_train_dev.tgz
tar -xzvf data/iirc.tgz
mv iirc_train_dev/ data/iirc
rm data/iirc.tgz

BioASQ

mkdir -p data/bioasq_7b_yesno
wget -O data/bioasq_7b_yesno/Task7B_yesno_train.json \
  https://huggingface.co/datasets/nanyy1025/bioasq_7b_yesno/resolve/main/Task7B_yesno_train.json
wget -O data/bioasq_7b_yesno/Task7B_yesno_validation.json \
  https://huggingface.co/datasets/nanyy1025/bioasq_7b_yesno/resolve/main/Task7B_yesno_validation.json
wget -O data/bioasq_7b_yesno/Task7B_yesno_test.json \
  https://huggingface.co/datasets/nanyy1025/bioasq_7b_yesno/resolve/main/Task7B_yesno_test.json

Please keep dataset paths consistent with the configuration used by data.py and config.json.

🔄 Pipeline

The practical workflow is:

Wikipedia passages
    -> prep_elastic.py
    -> Elasticsearch index

QA dataset
    -> data.py
    -> formatted evaluation samples
    -> main.py
    -> ETC decoding with dynamic retrieval
    -> generate.py / retriever.py
    -> predictions
    -> evaluate_.py

🚀 Quick Start

Step 1. Configure runtime settings

Edit config.json to match your environment, especially fields such as:

model path
dataset path
index name
retrieval threshold
backend settings

Step 2. Run ETC inference

python main.py --config config.json

Step 3. Evaluate predictions

python evaluate_.py

Before evaluation, ensure the prediction path and reference path are correctly configured.

🧠 Method Summary

ETC addresses the delayed retrieval problem in dynamic RAG by modeling the evolution of uncertainty during decoding.

Instead of triggering retrieval from a single token-level confidence value, ETC tracks:

the entropy sequence of generated tokens,
the first-order difference of entropy,
the second-order difference of entropy,
a dynamic smoothing signal for robust triggering.

This makes retrieval timing more sensitive to emerging uncertainty trends, enabling earlier and more accurate intervention.

📊 Main Findings

According to the paper, ETC consistently improves performance over strong training-free RAG baselines across six QA benchmarks and multiple LLM backbones. It is particularly effective in domain-specific scenarios, while also reducing average retrieval count.

These results support the core intuition of ETC: retrieval timing should be guided by uncertainty trends, not just isolated confidence drops.

🛠️ Script Notes

`main.py`

Main functionality:

runs ETC-based dynamic RAG inference,
controls retrieval timing,
integrates retrieval and generation.

`retriever.py`

Main functionality:

performs document retrieval from the external corpus,
interfaces with the indexed Wikipedia collection.

`prep_elastic.py`

Main functionality:

builds the Elasticsearch index for the retriever.

`generate.py`

Main functionality:

handles generation-related logic in the decoding loop.

`data.py`

Main functionality:

loads and preprocesses evaluation datasets.

`evaluate_.py`

Main functionality:

computes evaluation metrics on model predictions.

`config.json`

Main functionality:

stores runtime configuration and experiment settings.

❓ Common Issues

1. Elasticsearch is not running

The retriever depends on a working Elasticsearch service and a correctly built Wikipedia index.

2. Paths in `config.json` are not updated

Please update all model paths, dataset paths, and index paths before running.

3. Dataset files are missing or mislocated

Keep the downloaded datasets under the expected data/ structure.

4. spaCy model is not installed

ETC uses SpaCy for stop-word filtering. Make sure en_core_web_sm is installed.

5. Evaluation cannot find predictions

Check that the prediction output path matches the path expected by evaluate_.py.

🧭 Related RAG Projects

This repository is part of our broader research line on controllable and adaptive Retrieval-Augmented Generation (RAG).

GRIP [ACL 2026 Main Conference]: Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
A training-based dynamic RAG framework that internalizes retrieval control into token-level decoding.
ETC [AAAI 2026 Oral Paper]: Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG
A training-free dynamic RAG method that improves retrieval timing by modeling entropy trends during decoding.
SCD [AAAI 2026 Oral Paper]: Language Drift in Multilingual Retrieval-Augmented Generation
A training-free multilingual RAG method that mitigates language drift through decoding-time control.

Together, these projects cover three complementary directions in RAG: training-based retrieval planning, training-free retrieval timing, and decoding-time control for multilingual generation.

📖 Citation

@inproceedings{li2026modeling,
  title={Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG},
  author={Li, Bo and Tian, Tian and Xu, Zhenghua and Cheng, Hao and Zhang, Shikun and Ye, Wei},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  number={37},
  pages={31527--31535},
  year={2026}
}

If you use this repository, please cite the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
config.json		config.json
data.py		data.py
evaluate_.py		evaluate_.py
generate.py		generate.py
main.py		main.py
prep_elastic.py		prep_elastic.py
retriever.py		retriever.py

Folders and files

Latest commit

History

Repository files navigation

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

🌟 Overview

✨ Highlights

📦 What Is Released

🗂️ Repository Structure

⚙️ Installation

🧾 Prepare the Retriever

Step 1. Download Wikipedia passages

Step 2. Install and start Elasticsearch

Step 3. Build the Wikipedia index

🧪 Datasets

Download examples

2WikiMultihopQA

StrategyQA

HotpotQA

IIRC

BioASQ

🔄 Pipeline

🚀 Quick Start

Step 1. Configure runtime settings

Step 2. Run ETC inference

Step 3. Evaluate predictions

🧠 Method Summary

📊 Main Findings

🛠️ Script Notes

main.py

retriever.py

prep_elastic.py

generate.py

data.py

evaluate_.py

config.json

❓ Common Issues

1. Elasticsearch is not running

2. Paths in config.json are not updated

3. Dataset files are missing or mislocated

4. spaCy model is not installed

5. Evaluation cannot find predictions

🧭 Related RAG Projects

📖 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`main.py`

`retriever.py`

`prep_elastic.py`

`generate.py`

`data.py`

`evaluate_.py`

`config.json`

2. Paths in `config.json` are not updated

Packages