Augmented Lagrangian Learning Method #2

andrewrosemberg · 2025-07-18T20:17:52Z

Adds Augmented Lagrangian Primal-Dual Learning Method

klamike

Awesome work! I have a few comments, nothing major

klamike · 2025-07-23T03:13:15Z

Project.toml

+[deps]
+BatchNLPKernels = "7145f916-0e30-4c9d-93a2-b32b6056125d"
+CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
+ExaModels = "1037b233-b668-4ce9-9b63-f9f681f55dd2"
+Lux = "b2108857-7c20-44ae-9111-449ecde12c47"
+LuxCUDA = "d0bbae9a-e099-4d5b-a835-1c6931763bda"
+Optimisers = "3bd65402-5787-11e9-1adc-39752487f4e2"
+Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"


Do we need all of these? In particular CUDA, LuxCUDA, ExaModels?

klamike · 2025-07-23T03:16:20Z

test/runtests.jl

+            train_state_dual,
+            data,
+            stopping_criteria = [validation_testset],
+        )


Why not put a test like https://github.com/klamike/BatchNLPKernels.jl/blob/b29456ba92dd15748e4c677f0ca3db59a1e8caac/test/test_penalty.jl#L89?

klamike · 2025-07-23T03:24:14Z

src/L2OALM.jl

+Keywords:
+    - `max_dual`: Maximum value for the target dual variables.
+"""
+function LagrangianDualLoss(num_equal::Int; max_dual = 1e6)


Just a note, we should probably eventually have a somewhat standard interface for L2OMethods and their hyperparameters, i.e.

struct ALMMethod <: AbstractL2OMethod bm::BatchModel max_dual::Float64 ρ_init::Float64 end

or

struct ALMMethod <: AbstractL2OMethod bm::BatchModel hyperparameters::Dict{Symbol,Any} end

Ideally that also would help to clean up stuff like

L2OALM.jl/src/L2OALM.jl

Lines 196 to 197 in 0797bb7

hpm_primal[:ρ] = min(hpm_primal[:ρmax], hpm_primal[:ρ] * hpm_primal[:α])

hpm_dual[:ρ] = hpm_primal[:ρ] # Ensure dual model uses the same ρ

mutable struct PrimalDualTrainer primal_model::Lux primal_training_state:: dual_model::Lux dual_training_state:: data::Dataloader

klamike · 2025-07-23T03:26:23Z

test/runtests.jl

+
+        nvar = model.meta.nvar
+        ncon = model.meta.ncon
+        nθ = length(model.θ)


This is such a common thing, BNK should probably have a field for nθ, and expose a frontend like num_parameters, num_variables, num_constraints.

klamike · 2025-07-23T03:27:53Z

test/runtests.jl

+            gh_bound = gh_test[1:end-num_equal, :]
+            gh_equal = gh_test[end-num_equal+1:end, :]
+            dual_hat_bound = dual_hat[1:end-num_equal, :]
+            dual_hat_equal = dual_hat[end-num_equal+1:end, :]


This is another obvious thing BNK should have -- functions that help you deal with indices

klamike · 2025-07-23T03:38:55Z

src/L2OALM.jl

+        Dict{Symbol,Any}(
+            :ρ => 1.0,
+            :ρmax => 1e6,
+            :τ => 0.8,
+            :α => 10.0,
+            :max_violation => 0.0,
+        ),


Does Lux let you make these structs instead?

klamike · 2025-07-23T03:41:10Z

src/L2OALM.jl

+Default function that reconciles the state of the dual model after processing a batch of data.
+This function computes the mean dual loss from the batch states.
+"""
+function _reconcile_alm_dual_state(batch_states::Vector{NamedTuple})


Suggested change

function _reconcile_alm_dual_state(batch_states::Vector{NamedTuple})

function _reconcile_dual_state(batch_states::Vector{NamedTuple})

alm can be removed since that is this whole repo 😄 (needs updates everywhere else, and for the primal version, update_rho, etc. too. let me know if you agree and I can add that commit)

klamike · 2025-07-23T03:42:21Z

src/L2OALM.jl

+function _default_dual_loop(num_equal::Int)
+    return TrainingStepLoop(
+        LagrangianDualLoss(num_equal),
+        [(iter, current_state, hpm) -> iter >= 100 ? true : false],
+        Dict{Symbol,Any}(:max_dual => 1e6, :ρ => 1.0),
+        [],
+        _reconcile_alm_dual_state,
+        _pre_hook_dual,
+    )
+end


How about exposing the hyperparameters as kwargs here? Same for the primal one.

klamike · 2025-07-23T03:43:40Z

src/L2OALM.jl

+        stopping_criterion(
+            iter_primal,
+            current_state_primal,
+            training_step_loop_primal.hyperparameters,


Is this some standard Lux API? our stopping criteria don't need the state nor hyperparameters

klamike · 2025-07-23T03:45:15Z

src/L2OALM.jl

+function _pre_hook_primal(
+    θ,
+    primal_model,
+    train_state_primal,
+    dual_model,
+    train_state_dual,
+    bm,
+)
+    # Forward pass for dual model
+    dual_hat_k, _ = dual_model(θ, train_state_dual.parameters, train_state_dual.states)
+
+    return (dual_hat_k,)
+end


Is there some guidance for what to put in a "pre-hook" vs "loss" ? Do they get treated differently somehow?

keep hooks but move primal and dual evaluation inside loop with Chainerules.ignore_derivatives

klamike · 2025-07-23T03:56:52Z

src/L2OALM.jl

+mutable struct TrainingStepLoop
+    loss_fn::Function
+    stopping_criteria::Vector{Function}
+    hyperparameters::Dict{Symbol,Any}
+    parameter_update_fns::Vector{Function}
+    reconcile_state::Function
+    pre_hook::Function
+end


Why Vector {Function} for parameter_update_fns, stopping_criteria? I think it only ever uses one.

I see for the dual case there is no parameter_update_fn. I guess(x...) -> nothing can work there..

klamike · 2025-07-23T04:04:24Z

test/runtests.jl

+        Θ_train = randn(T, nθ, dataset_size) |> dev_gpu
+        Θ_test = randn(T, nθ, dataset_size) |> dev_gpu
+
+        primal_model = feed_forward_builder(nθ, nvar, [320, 320])


Not sure where but we should eventually have some magic for this... something like L2ONN.feed_forward(bm, input=:all_params, output=:all_vars, hidden_sizes=[320,320])

klamike · 2025-07-23T04:06:19Z

test/runtests.jl

+        bm_train = BNK.BatchModel(model, batch_size, config = BNK.BatchModelConfig(:full))
+        bm_test = BNK.BatchModel(model, dataset_size, config = BNK.BatchModelConfig(:full))


Suggested change

bm_train = BNK.BatchModel(model, batch_size, config = BNK.BatchModelConfig(:full))

bm_test = BNK.BatchModel(model, dataset_size, config = BNK.BatchModelConfig(:full))

bm_train = BNK.BatchModel(model, batch_size, config = BNK.BatchModelConfig(:viol_grad))

bm_test = BNK.BatchModel(model, dataset_size, config = BNK.BatchModelConfig(:viol_grad))

viol_grad suffices, to avoid jprod and hessian storage

klamike · 2025-07-24T17:11:13Z

Project.toml

+Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
+
+[sources]
+BatchNLPKernels = {url = "https://github.com/klamike/BatchNLPKernels.jl"}


Suggested change

BatchNLPKernels = {url = "https://github.com/klamike/BatchNLPKernels.jl"}

BatchNLPKernels = {url = "https://github.com/LearningToOptimize/BatchNLPKernels.jl"}

add primal dual losses

0732483

andrewrosemberg self-assigned this Jul 18, 2025

andrewrosemberg added 13 commits July 18, 2025 16:18

fix CI and docs

e4d4845

fix ci and docs

df50359

update deps

451e46f

update loop alm

7ac9513

add dockstrings

e36c2e8

add training main loop

9444604

update tests

58b2fbf

update tests

b84c8de

update deps

e3cd8a7

fix typos

a0f3354

running CPU

1b34f0f

format

6ad1d31

add source bnk

0797bb7

andrewrosemberg changed the title ~~WIP: Augmented Lagrangian Learning Method~~ Augmented Lagrangian Learning Method Jul 22, 2025

andrewrosemberg requested a review from klamike July 22, 2025 22:06

andrewrosemberg added the enhancement New feature or request label Jul 22, 2025

klamike requested changes Jul 23, 2025

View reviewed changes

klamike reviewed Jul 23, 2025

View reviewed changes

klamike reviewed Jul 24, 2025

View reviewed changes

change API

b894dd3

	hpm_primal[:ρ] = min(hpm_primal[:ρmax], hpm_primal[:ρ] * hpm_primal[:α])
	hpm_dual[:ρ] = hpm_primal[:ρ] # Ensure dual model uses the same ρ

	function _reconcile_alm_dual_state(batch_states::Vector{NamedTuple})
	function _reconcile_dual_state(batch_states::Vector{NamedTuple})

		bm_train = BNK.BatchModel(model, batch_size, config = BNK.BatchModelConfig(:full))
		bm_test = BNK.BatchModel(model, dataset_size, config = BNK.BatchModelConfig(:full))

	BatchNLPKernels = {url = "https://github.com/klamike/BatchNLPKernels.jl"}
	BatchNLPKernels = {url = "https://github.com/LearningToOptimize/BatchNLPKernels.jl"}

Augmented Lagrangian Learning Method #2

Are you sure you want to change the base?

Augmented Lagrangian Learning Method #2

Uh oh!

Conversation

andrewrosemberg commented Jul 18, 2025

Uh oh!

klamike left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klamike Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klamike Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

klamike Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

klamike Jul 23, 2025 •

edited

Loading

klamike Jul 23, 2025 •

edited

Loading

klamike Jul 23, 2025 •

edited

Loading