Reinforcement Learning in Continuous Time and Space

Article Properties

Language

English
DOI (url)

10.1162/089976600300015961
Publication Date

2000/01/01
Journal

Neural Computation
Indian UGC (Journal)
Refrences

9
Citations

357
Kenji Doya ATR Human Information Processing Research Laboratories, Soraku, Kyoto 619-0288, Japan

Abstract

Cite

Doya, Kenji. “Reinforcement Learning in Continuous Time and Space”. Neural Computation, vol. 12, no. 1, 2000, pp. 219-45, https://doi.org/10.1162/089976600300015961.

Doya, K. (2000). Reinforcement Learning in Continuous Time and Space. Neural Computation, 12(1), 219-245. https://doi.org/10.1162/089976600300015961

Doya K. Reinforcement Learning in Continuous Time and Space. Neural Computation. 2000;12(1):219-45.

Journal Categories

Medicine

Internal medicine

Neurosciences

Biological psychiatry

Neuropsychiatry

Science

Mathematics

Instruments and machines

Electronic computers

Computer science

Technology

Electrical engineering

Electronics

Nuclear engineering

Electronics

Technology

Mechanical engineering and machinery

Description

Can reinforcement learning adapt seamlessly to the complexities of continuous environments? This research introduces a novel framework for reinforcement learning (RL) that operates directly in continuous time and space, eliminating the need for prior discretization. This is crucial for tasks where precision and real-time adaptability are paramount. The core of the method lies in minimizing a continuous-time form of the temporal difference (TD) error, derived from the Hamilton-Jacobi-Bellman (HJB) equation. The authors develop update methods using backward Euler approximation and exponential eligibility traces, drawing parallels with traditional algorithms like residual gradient and TD(λ). They also formulate two policy improvement approaches: a continuous actor-critic method and a value-gradient-based greedy policy. These algorithms are valuable tools for various control and optimization problems. Simulations on pendulum swing-up and cart-pole swing-up tasks demonstrate the superiority of the proposed algorithms, particularly the value-gradient-based policy with a learned dynamic model, in terms of both trial count and efficiency. The results suggest potential applications in robotics, autonomous systems, and other fields requiring precise control and real-time adaptation. This research paves the way for more efficient and robust RL solutions in complex, continuous environments, pushing the boundaries of what autonomous agents can achieve.

Published in Neural Computation, a journal covering computational and mathematical approaches to understanding the brain and nervous system, this paper directly contributes to the journal's focus on reinforcement learning algorithms. By developing methods applicable to continuous-time dynamical systems, the research addresses key challenges in neural computation and control systems.

Refrences

Citations

Citations Analysis

Category	Category Repetition
Technology: Mechanical engineering and machinery	184
Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics	181
Science: Mathematics: Instruments and machines: Electronic computers. Computer science	147
Technology: Engineering (General). Civil engineering (General)	101
Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry	76

The first research to cite this article was titled 10.1162/153244303768966148 and was published in 2000. The most recent citation comes from a 2024 study titled 10.1162/153244303768966148 . This article reached its peak citation in 2021 , with 32 citations.It has been cited in 168 different journals, 16% of which are open access. Among related journals, the Neural Networks cited this research the most, with 21 citations. The chart below illustrates the annual citation trends for this article.