Skip to content

Commit 2ff9304

Browse files
authored
Small upgrades to training docs (#2331)
1 parent 5e80211 commit 2ff9304

File tree

2 files changed

+14
-6
lines changed

2 files changed

+14
-6
lines changed

docs/src/training/reference.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,6 @@ Because of this:
1010
* Flux defines its own version of `setup` which checks this assumption.
1111
(Using instead `Optimisers.setup` will also work, they return the same thing.)
1212

13-
The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
14-
The available rules are listed the [optimisation rules](@ref man-optimisers) page here;
15-
see the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.
16-
1713
```@docs
1814
Flux.Train.setup
1915
Flux.Train.train!(loss, model, data, state; cb)
@@ -47,10 +43,16 @@ Flux 0.13 and 0.14 are the transitional versions which support both; Flux 0.15 w
4743
The blue-green boxes in the [training section](@ref man-training) describe
4844
the changes needed to upgrade old code.
4945

46+
The available rules are listed the [optimisation rules](@ref man-optimisers) page here.
47+
48+
!!! compat "Old & new rules"
49+
The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` still returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
50+
5051
For full details on the interface for implicit-style optimisers, see the [Flux 0.13.6 manual](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
52+
See the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.
5153

5254
!!! compat "Flux ≤ 0.12"
53-
Earlier versions of Flux exported `params`, thus allowing unqualified `params(model)`
55+
Much earlier versions of Flux exported `params`, thus allowing unqualified `params(model)`
5456
after `using Flux`. This conflicted with too many other packages, and was removed in Flux 0.13.
5557
If you get an error `UndefVarError: params not defined`, this probably means that you are
5658
following code for Flux 0.12 or earlier on a more recent version.

docs/src/training/training.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,9 @@ callback API. Here is an example, in which it may be helpful to note:
225225
returns the value of the function, for logging or diagnostic use.
226226
* Logging or printing is best done outside of the `gradient` call,
227227
as there is no need to differentiate these commands.
228+
* To use `result` for logging purposes, you could change the `do` block to end with
229+
`return my_loss(result, label), result`, i.e. make the function passed to `withgradient`
230+
return a tuple. The first element is always the loss.
228231
* Julia's `break` and `continue` keywords let you exit from parts of the loop.
229232

230233
```julia
@@ -319,9 +322,12 @@ The first, [`WeightDecay`](@ref Flux.WeightDecay) adds `0.42` times original par
319322
matching the gradient of the penalty above (with the same, unrealistically large, constant).
320323
After that, in either case, [`Adam`](@ref Flux.Adam) computes the final update.
321324

325+
The same trick works for *L₁ regularisation* (also called Lasso), where the penalty is
326+
`pen_l1(x::AbstractArray) = sum(abs, x)` instead. This is implemented by `SignDecay(0.42)`.
327+
322328
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref Flux.Optimise.ClipValue) or [`ClipNorm`](@ref Flux.Optimise.ClipNorm).
323329

324-
Besides L2 / weight decay, another common and quite different kind of regularisation is
330+
Besides L1 / L2 / weight decay, another common and quite different kind of regularisation is
325331
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some outputs of the
326332
previous layer during training.
327333
It should switch automatically, but see [`trainmode!`](@ref Flux.trainmode!) / [`testmode!`](@ref Flux.testmode!) to manually enable or disable this layer.

0 commit comments

Comments
 (0)