eprintid: 27154 rev_number: 8 eprint_status: archive userid: 2 dir: disk0/00/02/71/54 datestamp: 2026-02-04 23:30:14 lastmod: 2026-02-04 23:30:15 status_changed: 2026-02-04 23:30:14 type: article metadata_visibility: show creators_name: ur Rehman, Hafiz Muhammad Raza creators_name: Gul, M. Junaid creators_name: Younas, Rabbiya creators_name: Jhandir, Muhammad Zeeshan creators_name: Álvarez, Roberto Marcelo creators_name: Miró Vera, Yini Airet creators_name: Ashraf, Imran creators_id: creators_id: creators_id: creators_id: creators_id: roberto.alvarez@uneatlantico.es creators_id: yini.miro@uneatlantico.es creators_id: title: End-to-end emergency response protocol for tunnel accidents augmentation with reinforcement learning ispublished: pub subjects: uneat_eng divisions: uneatlantico_produccion_cientifica divisions: uninimx_produccion_cientifica divisions: uninipr_produccion_cientifica divisions: unic_produccion_cientifica divisions: uniromana_produccion_cientifica full_text_status: public keywords: Robotic systems; drones; multi-agents system; path finding; reinforcement learning; tunnel hazards; unmanned aerial vehicles abstract: Autonomous unmanned aerial vehicles (UAVs) offer cost-effective and flexible solutions for a wide range of real-world applications, particularly in hazardous and time-critical environments. Their ability to navigate autonomously, communicate rapidly, and avoid collisions makes UAVs well suited for emergency response scenarios. However, real-time path planning in dynamic and unpredictable environments remains a major challenge, especially in confined tunnel infrastructures where accidents may trigger fires, smoke propagation, debris, and rapid environmental changes. In such conditions, conventional preplanned or model-based navigation approaches often fail due to limited visibility, narrow passages, and the absence of reliable localization signals. To address these challenges, this work proposes an end-to-end emergency response framework for tunnel accidents based on Multi-Agent Reinforcement Learning (MARL). Each UAV operates as an independent learning agent using an Independent Q-Learning paradigm, enabling real-time decision-making under limited computational resources. To mitigate premature convergence and local optima during exploration, Grey Wolf Optimization (GWO) is integrated as a policy-guidance mechanism within the reinforcement learning (RL) framework. A customized reward function is designed to prioritize victim discovery, penalize unsafe behavior, and explicitly discourage redundant exploration among agents. The proposed approach is evaluated using a frontier-based exploration simulator under both single-agent and multi-agent settings with multiple goals. Extensive simulation results demonstrate that the proposed framework achieves faster goal discovery, improved map coverage, and reduced rescue time compared to state-of-the-art GWO-based exploration and random search algorithms. These results highlight the effectiveness of lightweight MARL-based coordination for autonomous UAV-assisted tunnel emergency response. date: 2026-01 publication: Scientific Reports id_number: doi:10.1038/s41598-026-37191-w refereed: TRUE issn: 2045-2322 official_url: http://doi.org/10.1038/s41598-026-37191-w access: open language: en citation: Article Subjects > Engineering Europe University of Atlantic > Research > Scientific Production Ibero-american International University > Research > Scientific Production Ibero-american International University > Research > Articles and books Universidad Internacional do Cuanza > Research > Scientific Production University of La Romana > Research > Scientific Production ["eprint_fieldopt_access_open" not defined] ["eprint_fieldopt_language_en" not defined] Autonomous unmanned aerial vehicles (UAVs) offer cost-effective and flexible solutions for a wide range of real-world applications, particularly in hazardous and time-critical environments. Their ability to navigate autonomously, communicate rapidly, and avoid collisions makes UAVs well suited for emergency response scenarios. However, real-time path planning in dynamic and unpredictable environments remains a major challenge, especially in confined tunnel infrastructures where accidents may trigger fires, smoke propagation, debris, and rapid environmental changes. In such conditions, conventional preplanned or model-based navigation approaches often fail due to limited visibility, narrow passages, and the absence of reliable localization signals. To address these challenges, this work proposes an end-to-end emergency response framework for tunnel accidents based on Multi-Agent Reinforcement Learning (MARL). Each UAV operates as an independent learning agent using an Independent Q-Learning paradigm, enabling real-time decision-making under limited computational resources. To mitigate premature convergence and local optima during exploration, Grey Wolf Optimization (GWO) is integrated as a policy-guidance mechanism within the reinforcement learning (RL) framework. A customized reward function is designed to prioritize victim discovery, penalize unsafe behavior, and explicitly discourage redundant exploration among agents. The proposed approach is evaluated using a frontier-based exploration simulator under both single-agent and multi-agent settings with multiple goals. Extensive simulation results demonstrate that the proposed framework achieves faster goal discovery, improved map coverage, and reduced rescue time compared to state-of-the-art GWO-based exploration and random search algorithms. These results highlight the effectiveness of lightweight MARL-based coordination for autonomous UAV-assisted tunnel emergency response. metadata ur Rehman, Hafiz Muhammad Raza and Gul, M. Junaid and Younas, Rabbiya and Jhandir, Muhammad Zeeshan and Álvarez, Roberto Marcelo and Miró Vera, Yini Airet and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, roberto.alvarez@uneatlantico.es, yini.miro@uneatlantico.es, UNSPECIFIED (2026) End-to-end emergency response protocol for tunnel accidents augmentation with reinforcement learning. Scientific Reports. ISSN 2045-2322 document_url: http://repositorio.unib.org/id/eprint/27154/1/s41598-026-37191-w_reference.pdf