Is the natural gradient the key to efficient learning in complex systems? This research explores the advantages of using the natural gradient, rather than the ordinary gradient, for learning in parameter spaces with underlying structures. The study emphasizes that the ordinary gradient may not accurately represent the steepest direction of a function in such spaces, while the natural gradient does. The dynamics are analyzed, and an adaptive method for updating the learning rate is proposed. Information geometry is employed to calculate natural gradients in various contexts, including the parameter space of perceptrons, matrices (for blind source separation), and linear dynamical systems (for blind source deconvolution). These calculations provide a theoretical foundation for the method's effectiveness. The learning rate can be updated using an adaptive method. Through analysis, the natural gradient online learning is shown to be Fisher efficient, implying asymptotically optimal performance comparable to batch estimation. This suggests that the plateau phenomenon often observed in backpropagation learning algorithms may be mitigated by using the natural gradient. The research offers a valuable approach for improving the efficiency of learning in complex systems.
Published in Neural Computation, this paper addresses a core topic in the field of neural networks and machine learning. The journal focuses on computational and theoretical aspects of brain function and intelligent systems, making this exploration of the natural gradient highly relevant. The research contributes to the ongoing development of more efficient learning algorithms, a central theme of the journal.