• Feb 07, 2023 News!JACN will adopt Article-by-Article Work Flow. The benefit of article-by-article workflow is that a delay with one article may not delay the entire issue. Once a paper steps into production, it will be published online soon.   [Click]
  • May 30, 2022 News!JACN Vol.10, No.1 has been published with online version.   [Click]
  • Dec 24, 2021 News!Volume 9 No 1 has been indexed by EI (inspec)!   [Click]
General Information
    • ISSN: 1793-8244 (Print)
    • Abbreviated Title:  J. Adv. Comput. Netw.
    • Frequency: Semiyearly
    • DOI: 10.18178/JACN
    • Editor-in-Chief: Professor Haklin Kimm
    • Managing Editor: Ms. Alyssa Rainsford
    • Abstracting/ Indexing: EBSCO, ProQuest, and Google Scholar.
    • E-mail: editor@jacn.net
    • APC: 500USD
Editor-in-chief
Professor Haklin Kimm
East Stroudsburg University, USA
I'm happy to take on the position of editor in chief of JACN. We encourage authors to submit papers on all aspects of computer networks.

JACN 2024 Vol.12(1): 1-7
doi: 10.18178/jacn.2024.12.1.288

Dynamic Adjustment of Exploration Rate for PPO Algorithm to Relief Rapid Decline of Episode Rewards

Shingchern D. You 1, Chao-Wei Ku 1,2, and Chien-Hung Liu1
1. Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan
2. Taiwan Semiconductor Manufactory Co., Taiwan
Email: scyou@ntut.edu.tw (S.D.Y.); weichaoku@gmail.com (C.W.K.); cliu@ntut.edu.tw (C.H.L.)
*Corresponding author

Manuscript received May 20, 2023, revised July 27, 2023; accepted August 30, 2023; published February 2, 2024.

Abstract—Proximal Policy Optimization (PPO) is a widely used algorithm in reinforcement learning. We observe that the agent may repeatedly select actions in a fixed sequence in some environments, leading to rapid decline of rewards. After that, the reward remains very low for a prolonged training period. Consequently, the training efficiency reduces. In this paper, we propose an approach to dynamically adjust the constant in the entropy term in the objective function of the PPO algorithm to encourage the agent to explore. Our experimental results show that the proposed algorithm is effective to relief this detrimental situation of rapid decline of episode rewards.

Keywords—entropy, Proximal Policy Optimization (PPO), exploration rate, reinforcement learning

[PDF]

Cite: Shingchern D. You, Chao-Wei Ku, and Chien-Hung Liu, "Dynamic Adjustment of Exploration Rate for PPO Algorithm to Relief Rapid Decline of Episode Rewards," Journal of Advances in Computer Networks vol. 12, no. 1, pp. 1-7, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
Copyright © 2008-2024. Journal of Advances in Computer Networks.  All rights reserved.
E-mail: editor@jacn.net