JACN 2026 Vol.14(1): 11-18
DOI: 10.18178/jacn.2026.14.1.299
Adaptive Q-Learning-Based Energy-Efficient Multipath Routing for AODV and DSR in MANETs
Deden Ardiansyah1,2,*, Mochamad Agung Wibowo1, Mustafid3, and Teddy Mantoro1,2
1Doctoral Program of Information Systems, School of Postgraduate, Universitas Diponegoro, Semarang, Central Java, Indonesia
2Computer Engineering Department, Vocational School, Universitas Pakuan, Bogor, West Java, Indonesia
3School of Computer Science, Nusa Putra University, Sukabumi, Indonesia
Email: ardiansyahzhigadeden@gmail.com (D.A.); agung.wibowo@ft.undip.ac.id (M.A.W.); mustafid55@gmail.com (M.); tmantoro@gmail.com (T.M.)
*Corresponding author
Manuscript received February 6, 2026; accepted March 2, 2026; published April 30, 2026
Abstract—Mobile Ad Hoc Networks (MANETs) are self-configuring wireless systems widely used in dynamic, infrastructure-less environments such as military operations and disaster recovery. In such networks, routing protocols play a crucial role in ensuring efficient and reliable communication despite mobility and energy constraints. Conventional routing protocols such as Ad hoc On-Demand Distance Vector (AODV) and Dynamic Source Routing (DSR) often overlook energy-aware decision-making, leading to uneven load distribution, premature node failures, and network fragmentation. This issue is particularly critical in resource-limited environments, where battery life directly affects network longevity. To address these challenges, this study proposes integrating adaptive Q-Learning into the AODV and DSR protocols to develop AODV-Q and DSR-Q. The proposed approach incorporates residual energy, link stability, and delay into a dynamic reward function to guide route selection. Multipath routing is also implemented to enhance robustness and load balancing further. A simulation-based experimental setup was conducted using Network Simulator 2 (NS-2) and Python to evaluate energy efficiency, packet delivery ratio (PDR), end-to-end delay, and routing overhead. Comparative scenarios were designed to benchmark standard protocols against Q-Learning-enhanced protocols under identical network topologies. Results show that AODV-Q achieves 84.4% PDR, compared to 39.2% for Q-routing and 33.6% for Q-energy, and reduces energy consumption by 42.3% compared to Q-routing. Network lifetime improved to 300 s for baseline protocols, while Q-learning variants maintained operation for 223-241 s with better energy fairness. This study presents a novel energy-efficient routing framework that integrates reinforcement learning into reactive protocols. The proposed method is scalable, context-aware, and suitable for long-term MANET deployments in dynamic environments.
Keywords—Mobile Ad Hoc Networks (MANETs), adaptive Q-learning, energy efficiency, multipath routing, reactive protocols, Ad hoc On-Demand Distance Vector (AODV), Dynamic Source Routing (DSR)
[PDF]
Cite: Deden Ardiansyah, Mochamad Agung Wibowo, Mustafid, and Teddy Mantoro, "Adaptive Q-Learning-Based Energy-Efficient Multipath Routing for AODV and DSR in MANETs," Journal of Advances in Computer Networks, vol. 14, no. 1, pp. 11-18, 2026.
Copyright © 2026 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (
CC BY 4.0).