Jun 25, 2025 News!JACN Vol.13, No.1 has been published. [Click]
Dec 24, 2024 News!JACN Vol.12, No.2 has been published. [Click]
Feb 07, 2023 News!JACN will adopt Article-by-Article Work Flow. The benefit of article-by-article workflow is that a delay with one article may not delay the entire issue. Once a paper steps into production, it will be published online soon. [Click]

General Information

ISSN: 1793-8244 (Print)
Abbreviated Title: J. Adv. Comput. Netw.
Frequency: Semiyearly
DOI: 10.18178/JACN
Editor-in-Chief: Professor Haklin Kimm
Managing Editor: Ms. Alyssa Rainsford
Abstracting/ Indexing: EBSCO, ProQuest, and Google Scholar.
E-mail: editor@jacn.net
APC: 500USD

Editor-in-chief

Professor Haklin Kimm

East Stroudsburg University, USA

I'm happy to take on the position of editor in chief of JACN. We encourage authors to submit papers on all aspects of computer networks.

HOME > Archive > 2024 > Volume 12 Number 1 (2024) >

JACN 2024 Vol.12(1): 1-7
doi: 10.18178/jacn.2024.12.1.288

Dynamic Adjustment of Exploration Rate for PPO Algorithm to Relief Rapid Decline of Episode Rewards

Shingchern D. You 1, Chao-Wei Ku 1,2, and Chien-Hung Liu1

1. Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan
2. Taiwan Semiconductor Manufactory Co., Taiwan
Email: scyou@ntut.edu.tw (S.D.Y.); weichaoku@gmail.com (C.W.K.); cliu@ntut.edu.tw (C.H.L.)
*Corresponding author

Manuscript received May 20, 2023, revised July 27, 2023; accepted August 30, 2023; published February 2, 2024.

Abstract—Proximal Policy Optimization (PPO) is a widely used algorithm in reinforcement learning. We observe that the agent may repeatedly select actions in a fixed sequence in some environments, leading to rapid decline of rewards. After that, the reward remains very low for a prolonged training period. Consequently, the training efficiency reduces. In this paper, we propose an approach to dynamically adjust the constant in the entropy term in the objective function of the PPO algorithm to encourage the agent to explore. Our experimental results show that the proposed algorithm is effective to relief this detrimental situation of rapid decline of episode rewards.

Keywords—entropy, Proximal Policy Optimization (PPO), exploration rate, reinforcement learning

[PDF]

Cite: Shingchern D. You, Chao-Wei Ku, and Chien-Hung Liu, "Dynamic Adjustment of Exploration Rate for PPO Algorithm to Relief Rapid Decline of Episode Rewards," Journal of Advances in Computer Networks vol. 12, no. 1, pp. 1-7, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

PREVIOUS PAPER

First page

NEXT PAPER

Trust-Based Energy-Efficient Routing Using Mud Ring Optimization in Wireless Sensor Network

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Dynamic Adjustment of Exploration Rate for PPO Algorithm to Relief Rapid Decline of Episode Rewards