You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/training/reference.md
+7-5Lines changed: 7 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -10,10 +10,6 @@ Because of this:
10
10
* Flux defines its own version of `setup` which checks this assumption.
11
11
(Using instead `Optimisers.setup` will also work, they return the same thing.)
12
12
13
-
The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
14
-
The available rules are listed the [optimisation rules](@ref man-optimisers) page here;
15
-
see the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.
16
-
17
13
```@docs
18
14
Flux.Train.setup
19
15
Flux.Train.train!(loss, model, data, state; cb)
@@ -47,10 +43,16 @@ Flux 0.13 and 0.14 are the transitional versions which support both; Flux 0.15 w
47
43
The blue-green boxes in the [training section](@ref man-training) describe
48
44
the changes needed to upgrade old code.
49
45
46
+
The available rules are listed the [optimisation rules](@ref man-optimisers) page here.
47
+
48
+
!!! compat "Old & new rules"
49
+
The new implementation of rules such as Adam in the Optimisers is quite different from the old one in `Flux.Optimise`. In Flux 0.14, `Flux.Adam()` still returns the old one, with supertype `Flux.Optimise.AbstractOptimiser`, but `setup` will silently translate it to its new counterpart.
50
+
50
51
For full details on the interface for implicit-style optimisers, see the [Flux 0.13.6 manual](https://fluxml.ai/Flux.jl/v0.13.6/training/training/).
52
+
See the [Optimisers documentation](https://fluxml.ai/Optimisers.jl/dev/) for details on how the new rules work.
51
53
52
54
!!! compat "Flux ≤ 0.12"
53
-
Earlier versions of Flux exported `params`, thus allowing unqualified `params(model)`
55
+
Much earlier versions of Flux exported `params`, thus allowing unqualified `params(model)`
54
56
after `using Flux`. This conflicted with too many other packages, and was removed in Flux 0.13.
55
57
If you get an error `UndefVarError: params not defined`, this probably means that you are
56
58
following code for Flux 0.12 or earlier on a more recent version.
Copy file name to clipboardExpand all lines: docs/src/training/training.md
+7-1Lines changed: 7 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -225,6 +225,9 @@ callback API. Here is an example, in which it may be helpful to note:
225
225
returns the value of the function, for logging or diagnostic use.
226
226
* Logging or printing is best done outside of the `gradient` call,
227
227
as there is no need to differentiate these commands.
228
+
* To use `result` for logging purposes, you could change the `do` block to end with
229
+
`return my_loss(result, label), result`, i.e. make the function passed to `withgradient`
230
+
return a tuple. The first element is always the loss.
228
231
* Julia's `break` and `continue` keywords let you exit from parts of the loop.
229
232
230
233
```julia
@@ -319,9 +322,12 @@ The first, [`WeightDecay`](@ref Flux.WeightDecay) adds `0.42` times original par
319
322
matching the gradient of the penalty above (with the same, unrealistically large, constant).
320
323
After that, in either case, [`Adam`](@ref Flux.Adam) computes the final update.
321
324
325
+
The same trick works for *L₁ regularisation* (also called Lasso), where the penalty is
326
+
`pen_l1(x::AbstractArray) = sum(abs, x)` instead. This is implemented by `SignDecay(0.42)`.
327
+
322
328
The same `OptimiserChain` mechanism can be used for other purposes, such as gradient clipping with [`ClipGrad`](@ref Flux.Optimise.ClipValue) or [`ClipNorm`](@ref Flux.Optimise.ClipNorm).
323
329
324
-
Besides L2 / weight decay, another common and quite different kind of regularisation is
330
+
Besides L1 / L2 / weight decay, another common and quite different kind of regularisation is
325
331
provided by the [`Dropout`](@ref Flux.Dropout) layer. This turns off some outputs of the
326
332
previous layer during training.
327
333
It should switch automatically, but see [`trainmode!`](@ref Flux.trainmode!) / [`testmode!`](@ref Flux.testmode!) to manually enable or disable this layer.
0 commit comments