add ordinary least square with lapack backend #212

kchanqvq · 2025-06-16T08:40:12Z

Hi! When using MAGICL, I find that OLS is missing, a quite common use case, and I take a shot implementing it. This is my first PR and I'm happy to discuss improvements so that it meets the standard for inclusion!

BTW, when working with the source, I find some other things which can be improved, e.g.:

info from LAPACK is ignored, which could be used for better error reporting.

Is there interest working on these? Where is the right place to discuss?

stylewarning · 2025-06-16T14:12:09Z

Lots of interest! You could make an issue with the feature you're interested in implementing.

stylewarning · 2025-06-16T14:13:12Z

I will review this soon. Thanks!

kchanqvq · 2025-06-17T01:29:39Z

RFC: should second value of ols be r^2 (correlation coefficient) instead? This should be more useful in practice.

Maybe it should append an all-1 column to input as well and return 3 values A, b, r^2.

YarinHeffes

This is great addition! Most of my comments are nits.

Two more nits:

Replace usage of "least square" with "least squares" in docstrings.
Consider renaming ols to be more readable, e.g. least-squares-solve to match linear-solve.

YarinHeffes · 2025-06-17T23:00:49Z

src/extensions/lapack/lapack-templates.lisp

+            (b (make-array (* ldb nrhs) :element-type ',type))
+            (s (make-array (min m n) :element-type ',type))
+            (rcond (or rcond
+                       (* (max m n)


Why do we multiply by (max m n) here?

This is cargo-culted from NumPy. I assume this is an agreed sane default?

Got it, it just wasn't immediately obvious to me why not to use -1 as a default. Looks good.

src/extensions/lapack/lapack-templates.lisp

YarinHeffes · 2025-06-17T23:27:29Z

src/high-level/matrix.lisp

+(defun ols (a b &optional rcond)
+  "Attempt to solve the ordinary least square problem argmin_X ||B-AX||^2. Returns X and sum of squared residuals."


I recommend explaining rcond in the docstring here or removing rcond from the arglist (just for this function, keep it in lsd regardless)

src/high-level/matrix.lisp

tests/high-level-tests.lisp

YarinHeffes · 2025-06-17T23:38:34Z

info from LAPACK is ignored, which could be used for better error reporting.

I agree, I encourage you to open an issue to continue the discussion!

tests/high-level-tests.lisp

YarinHeffes · 2025-06-17T23:43:54Z

RFC: should second value of ols be r^2 (correlation coefficient) instead? This should be more useful in practice.

Maybe it should append an all-1 column to input as well and return 3 values A, b, r^2.

This would be useful. At minimum I think it should be documented that r is only returned when b has extra rows, or we can add a keyword arg &key get-residual-squares-p. What do you think?

kchanqvq · 2025-06-18T05:05:18Z

This would be useful. At minimum I think it should be documented that r is only returned when b has extra rows, or we can add a keyword arg &key get-residual-squares-p. What do you think?

Currently r is always returned, when b doesn't have extra rows it is 0.0... I just realized this is wrong if A is rank deficient, I will handle and test this in the next version! We might have to return NIL in these cases, as gelsd require both m>=n and rank=n for sum of residuals to be available.

2. Consider renaming ols to be more readable, e.g. least-squares-solve to match linear-solve.

Originally I want to have an ordinary linear regression routine, but I guess this should live in a higher level library than magicl itself, and magicl just provides the lower level least square routine?

IMO least-squares-solve is a bit wordy. How about least-squares?

stylewarning · 2025-06-18T14:24:35Z

least-squares is fine by me

kchanqvq · 2025-06-18T15:37:52Z

I just realized this is wrong if A is rank deficient, I will handle and test this in the next version! We might have to return NIL in these cases, as gelsd require both m>=n and rank=n for sum of residuals to be available.

Hmm, it seems that I need to get the OUT argument for rank of *gelsd for this to work, but the generated CFFI only allow passing in a value. Is this problem similar to why we currently don't get info out? Any idea how to work around this? Help wanted!

kchanqvq · 2025-06-18T15:50:52Z

Is this problem similar to why we currently don't get info out

How about change generate-interface.lisp so that the generated functions returns the values of all non-array arguments as multiple values? All Fortran functions return :void so this shouldn't conflict with anything existing. E.g. %gelsd would return (values m n nrhs lda ldb rcond rank lwork info) at the end.

YarinHeffes · 2025-06-18T19:31:22Z

Regarding the name, it was just a suggestion for my part, least-squares seems good to me.

The only change request that I think needs to be addressed for this PR is the documentation for that function, which is meant to be the high-level "user-friendly" function. Specifically, it should at least be more clear under what circumstances residual squares are returned. I think it's even more user friendly to have a boolean keyword flag for the user to request the residual squares.

Regarding handling errors and edge cases, I was tinkering with this earlier today and made a draft of how this could be addressed based on your suggestion and I also described two alternatives in the PR description. I'm not sure whether that needs to be incorporated into this PR or provided in a follow-up PR.

kchanqvq · 2025-06-18T19:41:14Z

The only change request that I think needs to be addressed for this PR is the documentation for that function, which is meant to be the high-level "user-friendly" function. Specifically, it should at least be more clear under what circumstances residual squares are returned. I think it's even more user friendly to have a boolean keyword flag for the user to request the residual squares.

Regarding handling errors and edge cases, I was tinkering with this earlier today and made a draft of how this could be addressed based on your suggestion and I also described two alternatives in the PR description. I'm not sure whether that needs to be incorporated into this PR or provided in a follow-up PR.

I think we have to get the rank out argument to do any of these reliably. If A is rank deficient but M>N, there are still values in higher rows of b, they're just gibberish value.

Or let's just not return sum of residuals in this PR for now?

stylewarning · 2025-06-19T16:08:35Z

@kchanqvq For a specific function I would just write the interface manually. Solving the problem more generally in the generated CFFI code would be interesting though.

kchanqvq · 2025-06-19T19:12:33Z

I remove the second residual value for now, we can later add it after we figure out OUT arguments (the user can compute it themselves for now). @stylewarning @YarinHeffes I think the PR is ready for another review!

kchanqvq · 2025-06-30T18:59:27Z

This seems to be hanging for a while, can someone help get it going? ❤️

kchanqvq force-pushed the master branch 3 times, most recently from 7ace529 to f04c9a0 Compare June 16, 2025 13:12

YarinHeffes suggested changes Jun 17, 2025

View reviewed changes

YarinHeffes reviewed Jun 17, 2025

View reviewed changes

tests/high-level-tests.lisp Outdated Show resolved Hide resolved

kchanqvq force-pushed the master branch from f04c9a0 to bc803bf Compare June 18, 2025 04:51

add ordinary least square with lapack backend

b14c5f9

kchanqvq force-pushed the master branch from bc803bf to b14c5f9 Compare June 19, 2025 19:10

		(defun ols (a b &optional rcond)
		"Attempt to solve the ordinary least square problem argmin_X \|\|B-AX\|\|^2. Returns X and sum of squared residuals."

add ordinary least square with lapack backend #212

Are you sure you want to change the base?

add ordinary least square with lapack backend #212

Uh oh!

Conversation

kchanqvq commented Jun 16, 2025

Uh oh!

stylewarning commented Jun 16, 2025

Uh oh!

stylewarning commented Jun 16, 2025

Uh oh!

kchanqvq commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YarinHeffes left a comment

Choose a reason for hiding this comment

Uh oh!

YarinHeffes Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

kchanqvq Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

YarinHeffes Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YarinHeffes Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YarinHeffes commented Jun 17, 2025

Uh oh!

Uh oh!

YarinHeffes commented Jun 17, 2025

Uh oh!

kchanqvq commented Jun 18, 2025

Uh oh!

stylewarning commented Jun 18, 2025

Uh oh!

kchanqvq commented Jun 18, 2025

Uh oh!

kchanqvq commented Jun 18, 2025

Uh oh!

YarinHeffes commented Jun 18, 2025

Uh oh!

kchanqvq commented Jun 18, 2025

Uh oh!

stylewarning commented Jun 19, 2025

Uh oh!

kchanqvq commented Jun 19, 2025

Uh oh!

kchanqvq commented Jun 30, 2025

Uh oh!

Uh oh!

kchanqvq commented Jun 17, 2025 •

edited

Loading