Preview

Hf-Rnn Supp

Powerful Essays
Open Document
Open Document
2870 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Hf-Rnn Supp
Learning Recurrent Neural Networks with
Hessian-Free Optimization: Supplementary
Materials
Contents
1 Pseudo-code for the damped Gauss-Newton vector product 2
2 Details of the pathological synthetic problems 3
2.1 The addition, multiplication, and XOR problem . . . . . . . . . . . . 3
2.2 The temporal order problem . . . . . . . . . . . . . . . . . . . . . . 4
2.3 The 3-bit temporal order problem . . . . . . . . . . . . . . . . . . . . 4
2.4 The random permutation problem . . . . . . . . . . . . . . . . . . . 4
2.5 Noiseless memorization . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Details of the natural problems 5
3.1 The bouncing balls problem . . . . . . . . . . . . . . . . . . . . . . 5
3.2 The MIDI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 The speech dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1
1 Pseudo-code for the damped Gauss-Newton vector product
Algorithm 1 Computation of the matrix-vector product of the structurally-damped
Gauss-Newton matrix with the vector v, for the case when e is the tanh non-linearity, g the logistic sigmoid, D and L are the corresponding matching loss functions. The notation reflects the “convex approximation” interpretation of the GN matrix so that we are applying the R operator to the forwards-backwards pass through the linearized and structurally damped objective ~k, and the desired matrix-vector product is given by
Rd~k
d . All derivatives are implicitly evaluated at  = n. The previously defined parameter symbols Wph, Whx, Whh, bh, bp binit h will correspond to the parameter vector n if they have no super-script and to the input parameter vector v if they have the ‘v’ superscript.
The Rz notation follows Pearlmutter [1994], and for the purposes of reading the pseudo-code can be interpreted as merely defining a new symbol. We assume that intermediate quantities of the network (e.g. hi) have already been computed (from n).
The operator



References: J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren. Darpa Timit: Acoustic-phonetic Continuous Speech Corps CD-ROM. US Dept. of Commerce, National Institute of Standards and Technology, 1993. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 1997. 7 J. Martens. Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. B.A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 1994. 8

You May Also Find These Documents Helpful

  • Good Essays

    This experiment was to use kinetics of projectile motion and free falling bodies to determine the distance a ball will travel after it hits a bounce plate. To determine this we had to use the equations x=(1/2)at2 and v=v0+at and derive an equation that will determine the distance the ball will travel based on the height of the bounce plate and the height of where the ball will be dropped above the bounce plate. The equation made was g*(sqrt(2)/sqrt(g))*(sqrt(H)*sqrt(h)). From here we can make an estimate of how far the ball will travel after it hits the bounce plate.…

    • 1086 Words
    • 5 Pages
    Good Essays
  • Satisfactory Essays

    Pt1420 Unit 3 Assignment

    • 298 Words
    • 2 Pages

    2. Consider the following two vectors: x =  1  and y =  1 .…

    • 298 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Part 1. Which ball goes higher in the air, the ball that is hit or the ball…

    • 381 Words
    • 2 Pages
    Good Essays
  • Good Essays

    1. You send a message to Buzz Aldrin on the moon, 384,000 km and he sends you an immediate reply. Both messages travel at the speed of light. How long do you have to wait between sending your message and receiving his? Over 2 sec.  (time= dist/speed= 384,000km/ 3x10^5)=1.28 Sec. (then X2 for there and back)…

    • 3861 Words
    • 16 Pages
    Good Essays
  • Good Essays

    1122

    • 572 Words
    • 3 Pages

    Second trial: (the two balls the large one and the small one fell at the same time)…

    • 572 Words
    • 3 Pages
    Good Essays
  • Powerful Essays

    Math/116 Syllabus

    • 2856 Words
    • 12 Pages

    This course introduces basic algebra concepts and assists in building skills for performing specific mathematical operations and problem solving. Students will solve equations, evaluate algebraic expressions, solve and graph linear equations and linear inequalities, graph lines, and solve systems of linear equations and linear inequalities. These concepts and skills will serve as a foundation for subsequent business coursework. Applications to real-world problems are also explored throughout the course. This course is the first half of the college algebra sequence, which is completed in MAT 117, Algebra 1B.…

    • 2856 Words
    • 12 Pages
    Powerful Essays
  • Satisfactory Essays

    econ 513 final exam

    • 2264 Words
    • 10 Pages

    (e) Write your answer to each problem in the space given below the problem. You may freely use back…

    • 2264 Words
    • 10 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Optimization Exam Paper

    • 1236 Words
    • 5 Pages

    Q1 (a) You are given that the formula for the total differential at the point x0 of a function f of n variables x1 , . . . , xn is…

    • 1236 Words
    • 5 Pages
    Satisfactory Essays
  • Good Essays

    Government is a group of elected officials by the people to carry out the will of the people. All of us have taken part in government. The main reason for that is because we are all apart of society as a whole. Everyone who lives in America is a citizen because we reside here, to be completely honest. The rules, as well as the laws, that our government creates has a major impact on us, as a whole. To be completely honest, our government consists of three branches. The first branch is called legislative. They are the people who write the laws in the first place. The second branch of government is the judicial branch. They are the people who uphold the laws and defend it. The last branch of government is the executive branch. They are the people…

    • 200 Words
    • 1 Page
    Good Essays
  • Good Essays

    my best

    • 783 Words
    • 4 Pages

    4) Once you know HOW your program should solve the problem, and you know WHAT variables loops statements etc are needed for that program to work, it should be easy to write down your finished answer.…

    • 783 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    Online Taxi Booking System

    • 2197 Words
    • 9 Pages

    The taxi booking system explains how taxi and vehicle booking works well in customers end and company taxi maintaining end (Cecil, 1980)…

    • 2197 Words
    • 9 Pages
    Satisfactory Essays
  • Satisfactory Essays

    * 1. Lesson 2 Basic Steps 2/4 Time Signature 3/4 Time Signature Chapter I Introduction Lesson Objectives Exercise 1.2…

    • 1430 Words
    • 6 Pages
    Satisfactory Essays
  • Satisfactory Essays

    viral

    • 607 Words
    • 3 Pages

    0• In this we find the time to live (ttl) of a message in both descending and ascending.…

    • 607 Words
    • 3 Pages
    Satisfactory Essays
  • Powerful Essays

    This sequence will be continue over and over. Route A is always green. The time delay for green light, is 9seconds. For Red is 5seconds. The simulation are successful according with the program coding.…

    • 1252 Words
    • 6 Pages
    Powerful Essays
  • Good Essays

    What Is Midi ?

    • 522 Words
    • 3 Pages

    In what ways can MIDI be used effectively in Multimedia Applications, as opposed to strictly musical applications ?…

    • 522 Words
    • 3 Pages
    Good Essays