|
| 1 | +# Efficient Halley's method for nonlinear solving |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +Say we have a system of $n$ equations with $n$ unknowns $f(x)=0$, and $f\in \mathbb R^n\to\mathbb R^n$ is sufficiently smooth. |
| 6 | + |
| 7 | +Given a initial guess $x_0$, Newton's method finds a solution by iterating like |
| 8 | + |
| 9 | +$$x_{i+1}=x_i-J(x_i)^{-1}f(x_i)$$ |
| 10 | + |
| 11 | +and this method converges quadratically. |
| 12 | + |
| 13 | +We can make it converge faster using higher-order derivative information. For example, Halley's method iterates like |
| 14 | + |
| 15 | +$$x_{i+1}=x_i-(a_i\odot a_i)\oslash(a_i-b_i/2)$$ |
| 16 | + |
| 17 | +where the vector multiplication and division $\odot,\oslash$ are defined element-wise, and term $a_i$ and $b_i$ are defined by $J(x_i)a_i = f(x_i)$ and $J(x_i)b_i = H(x_i)a_ia_i$. |
| 18 | + |
| 19 | +Halley's method is proved to converge cubically, which is faster than Newton's method. Here, we demonstrate that with TaylorDiff.jl, you can compute the Hessian-vector-vector product $H(x_i)a_ia_i$ very efficiently, such that the Halley's method is almost as cheap as Newton's method per iteration. |
| 20 | + |
| 21 | +## Implementation |
| 22 | + |
| 23 | +We first define the two iteration schemes mentioned above: |
| 24 | + |
| 25 | +```@example 1 |
| 26 | +using TaylorDiff, LinearAlgebra |
| 27 | +import ForwardDiff |
| 28 | +
|
| 29 | +function newton(f, x, p; tol = 1e-12, maxiter = 100) |
| 30 | + fp = Base.Fix2(f, p) |
| 31 | + for i in 1:maxiter |
| 32 | + fx = fp(x) |
| 33 | + error = norm(fx) |
| 34 | + println("Iteration $i: x = $x, f(x) = $fx, error = $error") |
| 35 | + error < tol && return |
| 36 | + J = ForwardDiff.jacobian(fp, x) |
| 37 | + a = J \ fx |
| 38 | + @. x -= a |
| 39 | + end |
| 40 | +end |
| 41 | +
|
| 42 | +function halley(f, x, p; tol = 1e-12, maxiter = 100) |
| 43 | + fp = Base.Fix2(f, p) |
| 44 | + for i in 1:maxiter |
| 45 | + fx = f(x, p) |
| 46 | + error = norm(fx) |
| 47 | + println("Iteration $i: x = $x, f(x) = $fx, error = $error") |
| 48 | + error < tol && return |
| 49 | + J = ForwardDiff.jacobian(fp, x) |
| 50 | + a = J \ fx |
| 51 | + hvvp = derivative(fp, x, a, Val(2)) |
| 52 | + b = J \ hvvp |
| 53 | + @. x -= (a * a) / (a - b / 2) |
| 54 | + end |
| 55 | +end |
| 56 | +``` |
| 57 | + |
| 58 | +Note that in Halley's method, the hessian-vector-vector product is computed with `derivative(fp, x, a, Val(2))`. It is guaranteed that asymptotically this is only taking 2x more time compared to evaluating `fp(x)` itself. |
| 59 | + |
| 60 | +Now we define some test function: |
| 61 | + |
| 62 | +```@example 1 |
| 63 | +f(x, p) = x .* x - p |
| 64 | +``` |
| 65 | + |
| 66 | +The Newton's method takes 6 iterations to converge: |
| 67 | + |
| 68 | +```@example 1 |
| 69 | +newton(f, [1., 1.], [2., 2.]) |
| 70 | +``` |
| 71 | + |
| 72 | +While the Halley's method takes 4 iterations to converge: |
| 73 | + |
| 74 | +```@example 1 |
| 75 | +halley(f, [1., 1.], [2., 2.]) |
| 76 | +``` |
0 commit comments