Article Subjects > Engineering <http://repositorio.unib.org/view/subjects/uneat=5Feng.html> Europe University of Atlantic > Research > Scientific Production <http://repositorio.unib.org/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Ibero-american International University > Research > Scientific Production <http://repositorio.unib.org/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Ibero-american International University > Research > Articles and books <http://repositorio.unib.org/view/divisions/uninipr=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Research > Scientific Production <http://repositorio.unib.org/view/divisions/unic=5Fproduccion=5Fcientifica.html>
University of La Romana > Research > Scientific Production <http://repositorio.unib.org/view/divisions/uniromana=5Fproduccion=5Fcientifica.html> ["eprint_fieldopt_access_open" not defined] ["eprint_fieldopt_language_en" not defined] Autonomous unmanned aerial vehicles (UAVs) offer cost-effective and flexible solutions for a wide range of real-world applications, particularly in hazardous and time-critical environments. Their ability to navigate autonomously, communicate rapidly, and avoid collisions makes UAVs well suited for emergency response scenarios. However, real-time path planning in dynamic and unpredictable environments remains a major challenge, especially in confined tunnel infrastructures where accidents may trigger fires, smoke propagation, debris, and rapid environmental changes. In such conditions, conventional preplanned or model-based navigation approaches often fail due to limited visibility, narrow passages, and the absence of reliable localization signals. To address these challenges, this work proposes an end-to-end emergency response framework for tunnel accidents based on Multi-Agent Reinforcement Learning (MARL). Each UAV operates as an independent learning agent using an Independent Q-Learning paradigm, enabling real-time decision-making under limited computational resources. To mitigate premature convergence and local optima during exploration, Grey Wolf Optimization (GWO) is integrated as a policy-guidance mechanism within the reinforcement learning (RL) framework. A customized reward function is designed to prioritize victim discovery, penalize unsafe behavior, and explicitly discourage redundant exploration among agents. The proposed approach is evaluated using a frontier-based exploration simulator under both single-agent and multi-agent settings with multiple goals. Extensive simulation results demonstrate that the proposed framework achieves faster goal discovery, improved map coverage, and reduced rescue time compared to state-of-the-art GWO-based exploration and random search algorithms. These results highlight the effectiveness of lightweight MARL-based coordination for autonomous UAV-assisted tunnel emergency response. metadata ur Rehman, Hafiz Muhammad Raza and Gul, M. Junaid and Younas, Rabbiya and Jhandir, Muhammad Zeeshan and Álvarez, Roberto Marcelo and Miró Vera, Yini Airet and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, roberto.alvarez@uneatlantico.es, yini.miro@uneatlantico.es, UNSPECIFIED     <http://repositorio.unib.org/id/eprint/27154/1/s41598-026-37191-w_reference.pdf>     (2026) End-to-end emergency response protocol for tunnel accidents augmentation with reinforcement learning.  Scientific Reports.   ISSN 2045-2322