Gradient Descent is a Technique for Learning to Learn

Authors

  • Taposh Kumar Neogy IBA, Rajshahi
  • Naresh Babu Bynagari Keypixel Software Solutions

DOI:

https://doi.org/10.18034/ajhal.v5i2.578

Keywords:

Gradient decent, Long Short Term Memory (LSTM), Gauss-Newton matrix, Machine learning, Recurrent neural network (RNN)

Abstract

In machine learning, the transition from hand-designed features to learned features has been a huge success. Regardless, optimization methods are still created by hand. In this study, we illustrate how an optimization method's design can be recast as a learning problem, allowing the algorithm to automatically learn to exploit structure in the problems of interest. On the tasks for which they are taught, our learning algorithms, implemented by LSTMs, beat generic, hand-designed competitors, and they also adapt well to other challenges with comparable structure. We show this on a variety of tasks, including simple convex problems, neural network training, and visual styling with neural art.

 

Metrics

Metrics Loading ...

Downloads

Download data is not yet available.

Author Biographies

  • Taposh Kumar Neogy, IBA, Rajshahi

    Assistant Professor of Accounting, Department of Business Administration, Institute of Business Administration (IBA), Rajshahi (under National University), BANGLADESH

  • Naresh Babu Bynagari, Keypixel Software Solutions

    Andriod Developer, Keypixel Software Solutions, 777 Washington rd Parlin NJ 08859, Middlesex, USA

References

Bach, F., R. Jenatton, J. Mairal, and G. Obozinski. 2012. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1–106.

Bengio, S., Y. Bengio, and J. Cloutier. 1995. On the search for new learning rules for ANNs. Neural Processing Letters, 2(4):26–30.

Bengio, Y., S. Bengio, and J. Cloutier. 1990. Learning a synaptic learning rule. Université de Montréal, Département d’informatique et de recherche opérationnelle.

Bynagari, N. B. (2014). Integrated Reasoning Engine for Code Clone Detection. ABC Journal of Advanced Research, 3(2), 143-152. https://doi.org/10.18034/abcjar.v3i2.575

Bynagari, N. B. (2015). Machine Learning and Artificial Intelligence in Online Fake Transaction Alerting. Engineering International, 3(2), 115-126. https://doi.org/10.18034/ei.v3i2.566

Bynagari, N. B. (2016). Industrial Application of Internet of Things. Asia Pacific Journal of Energy and Environment, 3(2), 75-82. https://doi.org/10.18034/apjee.v3i2.576

Bynagari, N. B. (2017). Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition. Asian Journal of Humanity, Art and Literature, 4(2), 147-156. https://doi.org/10.18034/ajhal.v4i2.577

Cotter N. E. and P. R. Conwell. 1990. Fixed-weight networks can learn. In International Joint Conference on Neural Networks, pages 553–559.

Daniel, C., J. Taylor, and S. Nowozin. 2016. Learning step size controllers for robust neural network training. In Association for the Advancement of Artificial Intelligence.

Deng, J., W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, pages 248–255. IEEE.

Donoho. D. L. 2006. Compressed sensing. Transactions on Information Theory, 52(4):1289–1306.

Duchi, J., E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159.

Feldkamp L. A. and G. V. Puskorius. 1998. A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proceedings of the IEEE, 86(11): 2259–2277.

Ganapathy, A. (2015). AI Fitness Checks, Maintenance and Monitoring on Systems Managing Content & Data: A Study on CMS World. Malaysian Journal of Medical and Biological Research, 2(2), 113-118. https://doi.org/10.18034/mjmbr.v2i2.553

Ganapathy, A. (2016). Speech Emotion Recognition Using Deep Learning Techniques. ABC Journal of Advanced Research, 5(2), 113-122. https://doi.org/10.18034/abcjar.v5i2.550

Ganapathy, A. (2017). Friendly URLs in the CMS and Power of Global Ranking with Crawlers with Added Security. Engineering International, 5(2), 87-96. https://doi.org/10.18034/ei.v5i2.541

Ganapathy, A., & Neogy, T. K. (2017). Artificial Intelligence Price Emulator: A Study on Cryptocurrency. Global Disclosure of Economics and Business, 6(2), 115-122. https://doi.org/10.18034/gdeb.v6i2.558

Gatys, L. A., A. S. Ecker, and M. Bethge. 2015. A neural algorithm of artistic style. arXiv Report 1508.06576.

Hochreiter S. and J. Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.

Hochreiter, S., A. S. Younger, and P. R. Conwell. 2001. Learning to learn using gradient descent. In International Conference on Artificial Neural Networks, pages 87–94. Springer.

Kingma D. P. and J. Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.

Krizhevsky. A. 2009. Learning multiple layers of features from tiny images. Technical report.

Lake, B. M., T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. 2016. Building machines that learn and think like people. arXiv Report 1604.00289.

Martens J. and R. Grosse. 2015. Optimizing neural networks with Kronecker-factored approximate curvature. In International Conference on Machine Learning, pages 2408–2417.

Nemhauser G. L. and L. A. Wolsey. 1988. Integer and combinatorial optimization. John Wiley & Sons.

Nesterov. Y. 1983. A method of solving a convex programming problem with convergence rate o (1/k2). In Soviet Mathematics Doklady, volume 27, pages 372–376.

Riedmiller M. and H. Braun. 1993. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In International Conference on Neural Networks, pages 586–591.

Runarsson and M. T. Jonsson. 2000. Evolution and design of distributed learning rules. In IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks, pages 59–63. IEEE.

Santoro, A., S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. 2016. Meta-learning with memory-augmented neural networks. In International Conference on Machine Learning.

Schmidhuber, J., J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28(1):105–130, 1997.

Schmidhuber. J. 1987. Evolutionary principles in self-referential learning; On learning how to learn: The meta-meta-hook. PhD thesis, Institut f. Informatik, Tech. Univ. Munich.

Schmidhuber. J. 1992. Learning to control fast-weight memories: An alternative to dynamic recurrent networks.Neural Computation, 4(1):131–139.

Schmidhuber. J. 1993. A neural network that embeds its own meta-levels. In International Conference on Neural Networks, pages 407–412. IEEE.

Schraudolph. N. N. 1999. Local gain adaptation in stochastic gradient descent. In International Conference on Artificial Neural Networks, volume 2, pages 569–574.

Sutton. R. S. 1992. Adapting bias by gradient descent: An incremental version of delta-bar-delta. In Association for the Advancement of Artificial Intelligence, pages 171–176.

Thrun S. and L. Pratt. 1998. Learning to learn. Springer Science & Business Media.

Tieleman T. and G. Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4:2.

Tseng. P. 1998. An incremental gradient (-projection) method with momentum term and adaptive stepsize rule. Journal on Optimization, 8(2):506–531.

Vadlamudi, S. (2015). Enabling Trustworthiness in Artificial Intelligence - A Detailed Discussion. Engineering International, 3(2), 105-114. https://doi.org/10.18034/ei.v3i2.519

Vadlamudi, S. (2016). What Impact does Internet of Things have on Project Management in Project based Firms?. Asian Business Review, 6(3), 179-186. https://doi.org/10.18034/abr.v6i3.520

Wolpert D. H. and W. G. Macready. 1997. No free lunch theorems for optimization. Transactions on Evolutionary Computation, 1(1):67–82.

Younger, A. S., S. Hochreiter, and P. R. Conwell. 2001. Meta-learning with backpropagation. In International Joint Conference on Neural Networks.

--0--

Downloads

Published

2018-11-30

Issue

Section

Peer-reviewed Article

How to Cite

Neogy, T. K., & Bynagari, N. B. (2018). Gradient Descent is a Technique for Learning to Learn. Asian Journal of Humanity, Art and Literature, 5(2), 145-156. https://doi.org/10.18034/ajhal.v5i2.578