You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- '1.4'# Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.
20
-
- '1'# Leave this line unchanged. '1' will automatically expand to the latest stable 1.x release of Julia.
21
-
- 'nightly'
21
+
- '1.4'
22
+
- '1'# automatically expands to the latest stable 1.x release of Julia
@@ -21,7 +20,7 @@ Tullio is a very flexible einsum macro. It understands many array operations wri
21
20
22
21
Used by itself the macro writes ordinary nested loops much like [`Einsum.@einsum`](https://github.com/ahwillia/Einsum.jl).
23
22
One difference is that it can parse more expressions (such as the convolution `M`, and worse).
24
-
Another is that it will use multi-threading (via [`Threads.@spawn`](https://julialang.org/blog/2019/07/multithreading/)) and recursive tiling, on large enough arrays.
23
+
Another is that it will use multi-threading (via [`Threads.@spawn`](https://julialang.org/blog/2019/07/multithreading/)) and recursive tiling, on large enough arrays.
25
24
But it also co-operates with various other packages, provided they are loaded before the macro is called:
26
25
27
26
* It uses [`LoopVectorization.@avx`](https://github.com/chriselrod/LoopVectorization.jl) to speed many things up. (Disable with `avx=false`.) On a good day this will match the speed of OpenBLAS for matrix multiplication.
@@ -48,18 +47,18 @@ The expression need not be just one line, for example:
48
47
49
48
Here the macro cannot infer the range of the output's indices `x,y`, so they must be provided explicitly.
50
49
(If writing into an existing array, with `out[x,y] = begin ...` or `+=`, then ranges would be taken from there.)
51
-
Because it sees assignment being made, it does not attempt to sum over `a,b`, and it assumes that indices could go out of bounds so does not add `@inbounds` for you.
50
+
Because it sees assignment being made, it does not attempt to sum over `a,b`, and it assumes that indices could go out of bounds so does not add `@inbounds` for you.
52
51
(Although in fact `mod(x+a) == mod(x+a, axes(mat,1))` is safe.)
53
52
It will also not be able to take a symbolic derivative, but dual numbers will work fine.
54
53
55
54
Pipe operators `|>` or `<|` indicate functions to be performed *outside* the sum, for example:
*`threads=false` turns off threading, while `threads=64^3` sets a threshold size at which to divide the work (replacing the macro's best guess).
246
245
*`avx=false` turns off the use of `LoopVectorization`, while `avx=4` inserts `@avx unroll=4 for i in ...`.
247
246
*`grad=false` turns off gradient calculation, and `grad=Dual` switches it to use `ForwardDiff` (which must be loaded).
248
-
*`nograd=A` turns of the gradient calculation just for `A`, and `nograd=(A,B,C)` does this for several arrays.
247
+
*`nograd=A` turns of the gradient calculation just for `A`, and `nograd=(A,B,C)` does this for several arrays.
249
248
*`tensor=false` turns off the use of `TensorOperations`.
250
249
* Assignment `xi = ...` removes `xi` from the list of indices: its range is note calculated, and it will not be summed over. It also disables `@inbounds` since this is now up to you.
251
250
*`verbose=true` prints things like the index ranges inferred, and gradient calculations. `verbose=2` prints absolutely everything.
@@ -256,20 +255,20 @@ The default setting is:
256
255
Implicit:
257
256
* Indices without shifts must have the same range everywhere they appear, but those with shifts (even `A[i+0]`) run over the intersection of possible ranges.
258
257
* Shifted output indices must start at 1, unless `OffsetArrays` is visible in the calling module.
259
-
* The use of `@avx`, and the calculation of gradients, are switched off by sufficiently complex syntax (such as arrays of arrays).
258
+
* The use of `@avx`, and the calculation of gradients, are switched off by sufficiently complex syntax (such as arrays of arrays).
260
259
* Gradient hooks are attached for any or all of `ReverseDiff`, `Tracker` & `Zygote`. These packages need not be loaded when the macro is run.
261
260
* Gradients are only defined for reductions over `(+)` (default) and `min`, `max`.
262
261
* GPU kernels are only constructed when both `KernelAbstractions` and `CUDA` are visible. The default `cuda=256` is passed to `kernel(CUDA(), 256)`.
263
262
* The CPU kernels from `KernelAbstractions` are called only when `threads=false`; they are not at present very fast, but perhaps useful for testing.
264
263
265
264
Extras:
266
-
*`A[i] := i^2 (i in 1:10)` is how you specify a range for indices when this can't be inferred.
265
+
*`A[i] := i^2 (i in 1:10)` is how you specify a range for indices when this can't be inferred.
267
266
*`A[i] := B[i, $col] - C[i, 2]` is how you fix one index to a constant (to prevent `col` being summed over).
268
-
*`A[i] := $d * B[i]` is the preferred way to include other constants. Note that no gradient is calculated for `d`.
267
+
*`A[i] := $d * B[i]` is the preferred way to include other constants. Note that no gradient is calculated for `d`.
269
268
* Within indexing, `A[mod(i), clamp(j)]` both maps `i` & `j` to lie within `axes(A)`, and disables inference of their ranges from `A`.
270
269
* Similarly, `A[pad(i,3)]` extends the range of `i`, inserting zeros outside of `A`. Instead of zero, `pad=NaN` uses this value as padding. The implementation of this (and `mod`, `clamp`) is not very fast at present.
271
270
* On the left, when making a new array, an underscore like `A[i+_] :=` inserts whatever shift is needed to make `A` one-based.
272
-
*`Tullio.@printgrad (x+y)*log(x/z) x y z` prints out how symbolic derivatives will be done.
271
+
*`Tullio.@printgrad (x+y)*log(x/z) x y z` prints out how symbolic derivatives will be done.
273
272
274
273
</details>
275
274
<details><summary><b>Interals</b></summary>
@@ -386,7 +385,7 @@ function ∇act!(::Type, ΔC, ΔA, ΔB, C, A, B, ax_i, ax_j, ax_k, keep)
386
385
end
387
386
```
388
387
389
-
Writing `@tullio verbose=2` will print all of these functions out.
388
+
Writing `@tullio verbose=2` will print all of these functions out.
390
389
391
390
Scalar reductions, such as `@tullio s := A[i,j] * log(B[j,i])`, are slightly different in that the `act!` function simply returns the sum, i.e. the variable `acc` above.
392
391
@@ -395,7 +394,7 @@ Scalar reductions, such as `@tullio s := A[i,j] * log(B[j,i])`, are slightly dif
395
394
396
395
Back-end friends & relatives:
397
396
398
-
*[LoopVectorization.jl](https://github.com/chriselrod/LoopVectorization.jl) is used here, if available.
397
+
*[LoopVectorization.jl](https://github.com/chriselrod/LoopVectorization.jl) is used here, if available.
399
398
400
399
*[Gaius.jl](https://github.com/MasonProtter/Gaius.jl) and [PaddedMatrices.jl](https://github.com/chriselrod/PaddedMatrices.jl) build on that.
401
400
@@ -415,7 +414,7 @@ Front-end near-lookalikes:
415
414
416
415
Things you can't run:
417
416
418
-
*[Tortilla.jl](https://www.youtube.com/watch?v=Rp7sTl9oPNI) seems to exist, publicly, only in this very nice talk.
417
+
*[Tortilla.jl](https://www.youtube.com/watch?v=Rp7sTl9oPNI) seems to exist, publicly, only in this very nice talk.
419
418
420
419
*[ArrayMeta.jl](https://github.com/shashi/ArrayMeta.jl) was a Julia 0.5 take on some of this.
0 commit comments