Autonomous reinforcement learning with experience replay

Paweł Wawrzyński , Ajay Kumar Tanwani

Abstract

This paper considers the issues of efficiency and autonomy that are required to make reinforcement learning suitable for real-life control tasks. A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. The algorithm is based on the actor–critic with experience replay whose step-sizes are determined on-line by an enhanced fixed point algorithm for on-line neural network training. An experimental study with simulated octopus arm and half-cheetah demonstrates the feasibility of the proposed algorithm to solve difficult learning control problems in an autonomous way within reasonably short time.
Author Paweł Wawrzyński (FEIT / AK)
Paweł Wawrzyński,,
- The Institute of Control and Computation Engineering
, Ajay Kumar Tanwani - [Swiss Federal Institute of Technology, Lausanne]
Ajay Kumar Tanwani,,
-
-
Journal seriesNeural Networks, ISSN 0893-6080
Issue year2013
Vol41
Pages 156–167
Keywords in EnglishActor–critic, Autonomous learning, Reinforcement learning, Step-size estimation
ASJC Classification1702 Artificial Intelligence; 2805 Cognitive Neuroscience
DOIDOI:10.1016/j.neunet.2012.11.007
URL http://www.sciencedirect.com/science/article/pii/S0893608012002936
Languageen angielski
File
wawrz neur net 2013.pdf 539.32 KB
Score (nominal)30
Score sourcejournalList
ScoreMinisterial score = 30.0, 01-09-2020, ArticleFromJournal
Ministerial score (2013-2016) = 30.0, 01-09-2020, ArticleFromJournal
Publication indicators WoS Citations = 16; Scopus Citations = 25; GS Citations = 49.0; Scopus SNIP (Source Normalised Impact per Paper): 2013 = 2.008; WoS Impact Factor: 2013 = 2.076 (2) - 2013=2.516 (5)
Citation count*49 (2020-09-23)
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back
Confirmation
Are you sure?