Language Drift in Multilingual Retrieval-Augmented Generation: Characterization and Decoding-Time Mitigation
AAAI 2026 Oral
Bo Li, Zhenghua Xu, Rui Xie
Multilingual retrieval-augmented generation (RAG) allows large language models to answer knowledge-intensive questions by using retrieved documents as external evidence. However, when the language of the retrieved evidence differs from the language of the user query or in-context exemplars, the model may generate responses in an unintended language. This phenomenon is referred to as language drift.
This issue becomes especially visible in reasoning-heavy generation, such as chain-of-thought decoding, where intermediate steps can further amplify language instability. Our work systematically studies language drift across multiple multilingual QA datasets, languages, and model backbones, and shows that the problem is not simply caused by comprehension failure. Instead, it is strongly related to decoder-level behavior, where dominant token distributions, especially English, can override the intended target language.
To mitigate this, we propose Soft Constrained Decoding (SCD), a lightweight and training-free decoding strategy that softly penalizes non-target-language tokens during generation. SCD is model-agnostic and can be integrated into standard generation pipelines without modifying model architecture or requiring additional training data. Experiments on three multilingual datasets show consistent improvements in language alignment and downstream task performance.
Related RAG projects from us: GRIP (ACL 2026 Main) · ETC (AAAI 2026 Oral) · SCD (AAAI 2026 Oral)
- Training-free decoding-time mitigation
- Model-agnostic and easy to integrate
- Focuses on language alignment in multilingual RAG
- Includes released multilingual versions of three QA datasets
- Suitable for analysis and follow-up research on multilingual reasoning and generation drift
The current repository contains the following main files:
.
├── README.md
├── SCD.py
├── data generation.py
├── dureader_MultiLang_1000.json
├── hotpotqa_MultiLang_1000.json
└── musique_MultiLang_1000.json
-
SCD.py
Main script containing the decoding-time mitigation logic for SCD. -
data generation.py
Script related to multilingual data construction / generation. -
dureader_MultiLang_1000.json
Multilingual DuReader-based dataset. -
hotpotqa_MultiLang_1000.json
Multilingual HotpotQA-based dataset. -
musique_MultiLang_1000.json
Multilingual MuSiQue-based dataset.
SCD addresses multilingual generation drift at decoding time.
Instead of retraining the model or introducing an additional controller, SCD adjusts the decoding distribution by softly discouraging tokens that are inconsistent with the intended target language. In this way, the method keeps generation close to the desired language while preserving the flexibility of open-ended reasoning.
This design makes SCD:
- simple to implement
- lightweight in inference
- compatible with standard autoregressive generation
- applicable to multilingual RAG settings with cross-lingual retrieval interference
This repository releases multilingual versions of three QA datasets for multilingual RAG research:
- DuReader
- HotpotQA
- MuSiQue
The released files are:
dureader_MultiLang_1000.json
hotpotqa_MultiLang_1000.json
musique_MultiLang_1000.json
These files can be used for:
- multilingual RAG evaluation
- language drift analysis
- decoding-time intervention experiments
- follow-up work on multilingual reasoning and language control
The repository is currently organized as a lightweight release of the core code and datasets.
- Prepare your multilingual RAG input data
- Load a target model in
SCD.py - Configure the target language and decoding settings
- Run generation with the SCD decoding processor
- Evaluate output language consistency and task performance on the released datasets
- The main decoding implementation is in
SCD.py. - The current code is most suitable for research use and follow-up adaptation.
- Depending on your environment, you may need to adjust model paths, device settings, and tokenizer/model loading code before running experiments.
This repository is especially useful for studying:
- output language drift in multilingual RAG
- cross-lingual interference during reasoning
- decoding-time language control
- multilingual chain-of-thought stability
- lightweight mitigation strategies without retraining
This repository is part of our broader research line on controllable and adaptive Retrieval-Augmented Generation (RAG).
-
GRIP [ACL 2026 Main Conference]: Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
A training-based dynamic RAG framework that internalizes retrieval control into token-level decoding. -
ETC [AAAI 2026 Oral Paper]: Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG
A training-free dynamic RAG method that improves retrieval timing by modeling entropy trends during decoding. -
SCD [AAAI 2026 Oral Paper]: Language Drift in Multilingual Retrieval-Augmented Generation
A training-free multilingual RAG method that mitigates language drift through decoding-time control.
Together, these projects cover three complementary directions in RAG: training-based retrieval planning, training-free retrieval timing, and decoding-time control for multilingual generation.
If you find this repository useful, please cite:
@inproceedings{DBLP:conf/aaai/LiXX26,
author = {Bo Li and
Zhenghua Xu and
Rui Xie},
editor = {Sven Koenig and
Chad Jenkins and
Matthew E. Taylor},
title = {Language Drift in Multilingual Retrieval-Augmented Generation: Characterization
and Decoding-Time Mitigation},
booktitle = {Fortieth {AAAI} Conference on Artificial Intelligence, Thirty-Eighth
Conference on Innovative Applications of Artificial Intelligence,
Sixteenth Symposium on Educational Advances in Artificial Intelligence,
{AAAI} 2026, Singapore, January 20-27, 2026},
pages = {31519--31526},
publisher = {{AAAI} Press},
year = {2026},
}