Indeterminancy for multiple transforms #688

janden · 2025-05-12T21:57:18Z

janden
May 12, 2025
Maintainer

When applying a 2D type 1 transform, we get slightly different results when computing multiple transforms at the same time compared to one transform at a time.

The following script reproduces the error in 2.4.0rc1 but gives the expected results in 2.3.0

import numpy as np                                                                             

import finufft                                                                                 
                                                                                               
f = np.load("data.npz")                                                                        
                                                                                               
data_f = f["data_f"]                                                                           
pts = f["pts"]                                                                                 
                                                                                               
n = 512                                                                                        
                                                                                               
def backproject(x, pts):                                                                       
    x_reshaped = x.reshape(np.prod(x.shape[:-1]), -1)                                          
                                                                                               
    imgs = finufft.nufft2d1(pts[0], pts[1], x_reshaped, (n, n), eps=1e-8)                      
    imgs = imgs.reshape((*x.shape[:-1], n, n)).real                                            
                                                                                               
    return imgs                                                                                
                                                                                               
back = backproject(data_f, pts)                                                                
back11 = backproject(data_f[1, 1], pts)                                                        
                                                                                               
print(np.allclose(back[1, 1], back11, atol=0, rtol=1e-7))                                      
print(np.max(np.abs(back[1, 1] - back11)))

data.zip

Specifically, for 2.3.0, we always pass the allclose check and the maximum error is between zero and 1e-10 , while for 2.4.0rc1, allclose usually fails (non-deterministic) with a maximum error around 1e-6.

The particular layout of the data seems important here. If I get rid of the reshapes, the errors drop somewhat (enough to not trigger the allclose check, but still higher compared to 2.3.0).

DiamonDinoia · 2025-05-13T16:53:04Z

DiamonDinoia
May 13, 2025
Maintainer

We should try with debug=2 to see if there is any difference in the parameters chosen.

0 replies

janden · 2025-05-13T17:26:54Z

janden
May 13, 2025
Maintainer Author

Right. For 2.4.0rc1, we get

[FINUFFT_PLAN_T] new plan: FINUFFT version 2.4.0 .................
[FINUFFT_PLAN_T] opts.nthreads=4
[FINUFFT_PLAN_T] threads 4, density 1, dim 2, nufft type 1, tol 1e-08: auto upsampfac=1.25
[FINUFFT_PLAN_T] 2d1: (ms,mt,mu)=(512,512,1) (nf1,nf2,nf3)=(640,640,1)
               ntrans=6 nthr=4 batchSize=3  spread_thread=2
[FINUFFT_PLAN_T] kernel fser (ns=14):           0.00028 s
[FINUFFT_PLAN_T] fwBatch 0.02GB alloc:          0.0122 s
[FINUFFT_PLAN_T] FFT plan (mode 64, nthr=4):    0.0021 s
[setpts] sort (didSort=1):              5.8e-05 s
[execute] start ntrans=6 (2 batches, bsize=3)...
[execute] start batch 0 (size 3):
        FFT exec:               0.0216 s
[execute] start batch 1 (size 3):
        FFT exec:               0.0128 s
[execute] done. tot spread:             0.0034 s
               tot FFT:                         0.0344 s
               tot deconvolve:                  0.00864 s
[FINUFFT_PLAN_T] new plan: FINUFFT version 2.4.0 .................
[FINUFFT_PLAN_T] opts.nthreads=4
[FINUFFT_PLAN_T] threads 4, density 1, dim 2, nufft type 1, tol 1e-08: auto upsampfac=1.25
[FINUFFT_PLAN_T] 2d1: (ms,mt,mu)=(512,512,1) (nf1,nf2,nf3)=(640,640,1)
               ntrans=1 nthr=4 batchSize=1
[FINUFFT_PLAN_T] kernel fser (ns=14):           0.00015 s
[FINUFFT_PLAN_T] fwBatch 0.01GB alloc:          0.00443 s
[FINUFFT_PLAN_T] FFT plan (mode 64, nthr=4):    0.00329 s
[setpts] sort (didSort=1):              1.8e-05 s
[execute] start ntrans=1 (1 batches, bsize=1)...
[execute] start batch 0 (size 1):
        FFT exec:               0.00472 s
[execute] done. tot spread:             0.00068 s
               tot FFT:                         0.00472 s
               tot deconvolve:                  0.00189 s

while 2.3.0 gives us

[finufft_makeplan] new plan: FINUFFT version 2.3.0 .................
[finufft_makeplan] set auto upsampfac=2.00
[finufft_makeplan] 2d1: (ms,mt,mu)=(512,512,1) (nf1,nf2,nf3)=(1024,1024,1)
               ntrans=6 nthr=8 batchSize=6  spread_thread=2
[finufft_makeplan] kernel fser (ns=9):          0.000458 s
[finufft_makeplan] fwBatch 0.10GB alloc:        1e-05 s
[finufft_makeplan] FFTW plan (mode 64, nthr=8): 0.00175 s
[finufft_setpts] spreadcheck (1):       1e-06 s
[finufft_setpts] sort (didSort=1):              0.000123 s
[finufft_execute] start ntrans=6 (1 batches, bsize=6)...
[finufft_execute] start batch 0 (size 6):
        FFT exec:               0.0471 s
[finufft_execute] done. tot spread:             0.0158 s
               tot FFT:                         0.0471 s
               tot deconvolve:                  0.00449 s
[finufft_makeplan] new plan: FINUFFT version 2.3.0 .................
[finufft_makeplan] set auto upsampfac=2.00
[finufft_makeplan] 2d1: (ms,mt,mu)=(512,512,1) (nf1,nf2,nf3)=(1024,1024,1)
               ntrans=1 nthr=8 batchSize=1
[finufft_makeplan] kernel fser (ns=9):          0.000326 s
[finufft_makeplan] fwBatch 0.02GB alloc:        2.1e-05 s
[finufft_makeplan] FFTW plan (mode 64, nthr=8): 0.00172 s
[finufft_setpts] spreadcheck (1):       0 s
[finufft_setpts] sort (didSort=1):              2.2e-05 s
[finufft_execute] start ntrans=1 (1 batches, bsize=1)...
[finufft_execute] start batch 0 (size 1):
        FFT exec:               0.0109 s
[finufft_execute] done. tot spread:             0.0144 s
               tot FFT:                         0.0109 s
               tot deconvolve:                  0.00238 s

So the main difference seems to be the automatic choice of upsampfac, which goes from 2.00 in 2.3.0 to 1.25 in 2.4.0rc1.

Indeed, manually overriding upsampfac in 2.4.0rc1 fixes the problem (max error drops to 1e-10). So the question is, why are we getting this upsampfac now?

0 replies

DiamonDinoia · 2025-05-13T18:18:36Z

DiamonDinoia
May 13, 2025
Maintainer

I tuned the heuristic because it is faster. According to the error analysis and #677 it should work.
Not sure why is it causing an issue here.

0 replies

ahbarnett · 2025-05-21T20:24:37Z

ahbarnett
May 21, 2025
Maintainer

Dear Joakim,

So, I examined this fascinating failure pretty closely: I do not consider it a failure of finufft to provide outputs within the requested accuracy, so I don't think there is anything that actually has to be fixed, as I will explain. But it raises a new type of concern when using upsampfac=1.25 at eps<1e-7 where ns=13...15, close to its max.

What your test code demands is that a multithreaded ntrans=6 produce a "close" answer to doing those transforms separately. (In particular you've tested index [1,1,0] of the (2,3,1) shaped set of 6).
Since arithmetic is not done in the same order, due to the different thread use, answers should differ by of order epsmach,
independent of the tolerance eps. Your tests use rtol=1e-7, which is applied elementwise by allclose, meaning that each element's relative change can't exceed 1e-7. The offending element of back11 has size about 16, but the typical element of back11 has size 2e5. The l1 norm of data_f is 4e5, so by asking for 1e-7 in allclose you're really asking for abs err of about 2e-6, namely about 5e-12 relative to the l1-norm of the input.
So, really your test is whether finufft called with eps=1e-8 remains stable to below 1e-11 relative error, which is unreasonable to demand, and certainly not guaranteed as part of the API. The API could output anything any time you call finufft, as long as it's within roughly eps of the true answer (relative to the input norm).
You'll notice you are demanding stability 3 digits better than this!
You'll notice nthreads=1 makes the differences go to zero because the arithmetic then occurs in the same order.

Now, this relative change (due to arithmetic order, not tolerance) is about 5e4 times epsmach, which is admittedly a large prefactor: it turns out that upsampfac=1.25 amplifies this prefactor here more than usf=2.0, which seems to max out at a prefactor of <10, ie, 15 digits are stable. By picking eps=2e-9 we can make usf=1.25 even worse: only 11 digits stable (although it's always better than eps itself, hence I defend that it has not "failed"!). For eps=1e-6 we get 13 digits stable, and eps=1e-4 we get 14 digits stable.

I believe the cause is larger amplification in the deconv step - I will look into that right now.

I do propose to add an alert in the docs. I also propose to tweak Marco's heuristics to push usf=2.0 to be used for eps<=3e-8, so we avoid the ns>12 cases with usf=1.25, at least as defaults.

I also propose that you change your unit tests in ASPIRE, since you're testing for a feature that is not promised: repeatability to around 11 digits of doing a transform with different threading, when the tolerance is only 8 digits. Maybe you should avoid allclose and instead divide the max norm of differece by the input l1 norm, before testing?

Do you agree? Any suggestions to the above, before we release 2.4.0 asap?

Best wishes, Alex

0 replies

ahbarnett · 2025-05-21T20:34:24Z

ahbarnett
May 21, 2025
Maintainer

Let me add that rounding error due to the condition number of the problem (which the different arithmetic order exposes) should also scale like N1=512 in this case, ie max(N1,N2).epsmach. This would lead to one expecting about 13 digits stability, best case. (Why usf=2.0 is better than this, I don't know). So 11 digits does not feel so bad after all.

0 replies

ahbarnett · 2025-05-21T20:54:27Z

ahbarnett
May 21, 2025
Maintainer

Indeed, I just checked the maximum amplification factor relative to zero-freq, which is called $r_{\text{dyn}}$ in Remark 3 of our main FINUFFT 2019 SISC paper, is up to 10^3 for usf=5/4, at ns=15, whereas it's 8 for usf=2 at the same ns.

Matlab verification (see eqn (3.21) and (4.5) from the paper):

>> s=2.0; w=15; p = 1-sqrt(1-1/(2*s-1)^2), b = 0.97*pi*(1-1/(2*s))*w, rdyn=exp(p*b)
p =
        0.0571909584179366
b =
          34.2826298322986
rdyn =
          7.10398899951166

>> s=5/4; w=15; p = 1-sqrt(1-1/(2*s-1)^2), b = 0.97*pi*(1-1/(2*s))*w, rdyn=exp(p*b)
p =
          0.25464400750007
b =
          27.4261038658389
rdyn =
          1079.11117860662

Whether this is a problem I don't know. usf=5/4 is very useful for speed, so I'll add docs re repeatability, and tweak heuristics.hpp to avoid ns>12 by default.

0 replies

ahbarnett · 2025-05-21T21:39:33Z

ahbarnett
May 21, 2025
Maintainer

I have pushed this to docs. But I'm not sure heuristics.hpp needs tweaking. Await Joakim's input. Until tomorrow, Alex

0 replies

janden · 2025-05-22T20:04:35Z

janden
May 22, 2025
Maintainer Author

Hi Alex,

Thanks for taking the time to go through this. Let me see if I've got this correctly:

The API does not guarantee that the same transform give the same output up to machine epsilon. I know we had discussed this at some point, but I wasn't sure what the outcome was. The guarantee is only with respect to the eps parameter times the input l1-norm, got it.
Somehow, the different upsampling factor gives a larger error due to arithmetic order (why this depends on the usf, I don't quite understand here). Since the heuristics for picking the usf has changed, we see a different behavior in 2.4.0.
What I don't quite understand is how this error (due to arithmetic order) can be guaranteed to be below the overall eps error (well, I guess we don't make any strict guarantees, but still, how do we know that this won't get too big)?

Overall, it seems reasonable here to adjust our tolerances in ASPIRE given that we are not guaranteed machine epsilon accuracy with repeated transforms.

0 replies

ahbarnett · 2025-05-22T20:25:35Z

ahbarnett
May 22, 2025
Maintainer

With regard to your question: 3. What I don't quite understand is how this error (due to arithmetic order) can be guaranteed to be below the overall eps error (well, I guess we don't make any strict guarantees, but still, how do we know that this won't get too big)? Recall that r_{dyn} sets the amplification of rounding error differences that one would expect due to reordering (repeatability under different threading). So the repeatability variation should be O( r_{dyn} eps_{mach}) or possibly O( r_{dyn} eps_{mach} max(N_i) ); I don't know which yet. r_{dyn} <10 for usf=2.0; but r_{dyn}<1e3 for usf=5/4. That addresses your question #2 too. The only reason this is less than ``eps`` (the tol param), in float64 at least, is that ns gets too big for usf=5/4 for ``eps``<1e-9, so we always switch to usf=2 for such ``eps``. Are you ok with us releasing, now? Best, Alex

…

On Thu, May 22, 2025 at 4:04 PM Joakim Andén ***@***.***> wrote: *janden* left a comment (flatironinstitute/finufft#679) <#679 (comment)> Hi Alex, Thanks for taking the time to go through this. Let me see if I've got this correctly: 1. The API does not guarantee that the same transform give the same output up to machine epsilon. I know we had discussed this at some point, but I wasn't sure what the outcome was. The guarantee is only with respect to the eps parameter times the input l1-norm, got it. 2. Somehow, the different upsampling factor gives a larger error due to arithmetic order (why this depends on the usf, I don't quite understand here). Since the heuristics for picking the usf has changed, we see a different behavior in 2.4.0. 3. What I don't quite understand is how this error (due to arithmetic order) can be guaranteed to be below the overall eps error (well, I guess we don't make any strict guarantees, but still, how do we know that this won't get too big)? Overall, it seems reasonable here to adjust our tolerances in ASPIRE given that we are not guaranteed machine epsilon accuracy with repeated transforms. — Reply to this email directly, view it on GitHub <#679 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSSHVNXX3OPRGUL7EZD27YUWRAVCNFSM6AAAAAB47ACS4KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBSGQYTSMJUGQ> . You are receiving this because you were assigned.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

0 replies

janden · 2025-05-23T06:13:48Z

janden
May 23, 2025
Maintainer Author

Ah I see. That makes sense to me then.

Yes I'm fine with releasing.

0 replies

Indeterminancy for multiple transforms #688

Uh oh!

janden May 12, 2025 Maintainer

Replies: 10 comments

Uh oh!

DiamonDinoia May 13, 2025 Maintainer

Uh oh!

janden May 13, 2025 Maintainer Author

Uh oh!

DiamonDinoia May 13, 2025 Maintainer

Uh oh!

Uh oh!

ahbarnett May 21, 2025 Maintainer

Uh oh!

Uh oh!

ahbarnett May 21, 2025 Maintainer

Uh oh!

ahbarnett May 21, 2025 Maintainer

Uh oh!

ahbarnett May 21, 2025 Maintainer

Uh oh!

janden May 22, 2025 Maintainer Author

Uh oh!

ahbarnett May 22, 2025 Maintainer

Uh oh!

janden May 23, 2025 Maintainer Author

janden
May 12, 2025
Maintainer

DiamonDinoia
May 13, 2025
Maintainer

janden
May 13, 2025
Maintainer Author

DiamonDinoia
May 13, 2025
Maintainer

ahbarnett
May 21, 2025
Maintainer

ahbarnett
May 21, 2025
Maintainer

ahbarnett
May 21, 2025
Maintainer

ahbarnett
May 21, 2025
Maintainer

janden
May 22, 2025
Maintainer Author

ahbarnett
May 22, 2025
Maintainer

janden
May 23, 2025
Maintainer Author