Skip to content

Conversation

demroz
Copy link
Contributor

@demroz demroz commented Aug 16, 2025

This pull request introduces a new library for reverse-mode automatic differentiation. Its a tape based reverse mode autodiff, so the idea is to build a computational graph, and then call backward on the entire graph to compute all the gradients at once.

Currently it supports all the basic operations (+,-,/,/), everything listed in conceptual requirements for real number types, and boost calls to erf, erc, erf_inv and erfc_inv. Everything is tested up to the 4th derivative. The list of tests:
test_reverse_mode_autodiff_basic_math_ops.cpp
test_reverse_mode_autodiff_comparison_operators.cpp
test_reverse_mode_autodiff_constructors.cpp
test_reverse_mode_autodiff_error_functions.cpp
test_reverse_mode_autodiff_flat_linear_allocator.cpp
test_reverse_mode_autodiff_stl_support.cpp

There are also two examples in the example directory:
reverse_mode_linear_regression_example.cpp -> simple linear regression that demonstrates how this library can be used for optimization

autodiff_reverse_black_scholes.cpp -> a rewrite of the forward mode equivalent.

Important notes

  1. The intent of the library is to be an engine for gradient based optimization. (In the future I'd be interested in adding some gradient based optimizers to boost if there is interest.) I wrote it with first derivatives in mind. Although everything is tested up to the 4th derivative, the library is intended to be optimally used for calculating first order derivatives. Its best to use forward mode autodiff if very high orders are needed.
  2. The library is based on expression templates. This means that although my autodiff variable rvar does satisfy all the requirements of a real-type, it doesn't necessarily play nicely with boost special functions. For example:
rvar<double,1> x = make_rvar<T,1>(1.0);
rvar<double,1> y = make_rvar<T,1>(1.0);
rvar<double,1> z = make_rvar<T,1>(1.0);
auto f = x+y*z;

f in this case is not actually type rvar, but add_expr<rvar,mult_expr<rvar,rvar>>

  1. reverse_mode_autodiff_memory_management.hpp has a flat_linear_allocator class thats essentially a specialized memory arena. It uses placement new for memory allocations. This is a deliberate design choice. The flat_linear_allocator destructor explicitly calls the destructors of individual elements. Explicit calls to delete shouldn't be needed here

Thank you, and looking forward to your feedback

@demroz
Copy link
Contributor Author

demroz commented Aug 24, 2025

@mborland @ckormanyos I wrote the first draft of the docs. It took a bit longer than i though it would. Please take a look at let me know if there are any changes/additions needed. Its my first time working with the boost documentation, but building the docs locally seems to change a bunch of html files for me. I didnt want to have 500+ files changed in the PR, so I've been doing

git ls-files doc/html | xargs git update-index --assume-unchanged

to ignore these changes, but some some still go through. Is there a better way of doing this? Also @jzmaddock thank you for taking a look at the library. I have a few responses to your comments below

In the comment: "another good example is whenever there is an internal call to a special function doing something like this: sin_pi(2.0*x); ", that should never happen, we do test again multiprecision using expression templates, so we should be expression template safe. It's possible that since you're operating at say double precision that you're taking a code branch we haven't tested against expression-templated UDT's yet, but that would be a bug at our end. Being able to use a tool like this against arbitrary special functions seems like a good win to me. But:

Unfortunately this seems to happen quite often. See lines 64, 68, 215, of bessel.hpp, or 88/89 of detail/bessel_jy_asym.hpp. I'd need to take a more dedicated crack at cyl_bessel_j to see how much change would be needed to get ETs to work with this function.

Mention was made of tgamma(5) resulting in zero derivative: as pointed out, this is down to integer arguments being treated as constant results. It's the correct (and fast) thing to do in general, but obviously breaks a tool like this. The right thing to do here is to specialize the lowest level function that blows up: unchecked_factorial in this case. That function is used all over (not just in tgamma) so fixing that would fix a whole bunch of things in one go. But:

There are many other special cases where we return a constant result: usually this is at the end point of a domain, or else say at x = 0, where the result is known, and might not be easily calculated by the normal methods anyway. Again for normal scalar types it's important to filter those special cases off early, but they will potentially break an autodiff tool. These are scattered all over, and I'm not sure how we fix them. Possibly a trait for detecting types like this, and then some constexpr if's around problematic code.

The special cases that return a constant result are indeed a big problem. I actually have a mechanism in my code for detecting expression types, so I think your solution makes sense.

And finally, running any new number type through our special functions is a really good test of the correctness of the implementation: the type will really get hammered into all kind of corner cases that you wouldn't find otherwise, and has been invaluable in testing our multiprecision types.

Totally agree with you.

@jzmaddock
Copy link
Collaborator

Unfortunately this seems to happen quite often. See lines 64, 68, 215, of bessel.hpp, or 88/89 of detail/bessel_jy_asym.hpp. I'd need to take a more dedicated crack at cyl_bessel_j to see how much change would be needed to get ETs to work with this function.

Ah got you, in multiprecision we fix this with:

namespace boost { namespace math {
   namespace tools {

      template <class T>
      struct promote_arg;

      template <class tag, class A1, class A2, class A3, class A4>
      struct promote_arg<boost::multiprecision::detail::expression<tag, A1, A2, A3, A4> >
      {
         using type = typename boost::multiprecision::detail::expression<tag, A1, A2, A3, A4>::result_type;
      };
}}}

So that if any special function is called with an expression template, then it (internally/magically) converts that to the "real" type. This means that in user code, if someone writes:

tgamma(a+b);

then it all still works, even though a+b may be an expression template.

@demroz
Copy link
Contributor Author

demroz commented Aug 25, 2025

@ckormanyos Some good news on the expression template on/off macro. I have a working version under the add_no_expression_template_support branch. With the change

#define BOOST_MATH_ET_OFF
#include <boost/math/differentiation/autodiff_reverse.hpp>
#include <boost/math/special_functions/bessel.hpp>
#include <boost/math/special_functions/gamma.hpp>
#include <iostream>

namespace rdiff = boost::math::differentiation::reverse_mode;

int main()
{
    constexpr std::size_t N = 1;
    using rvar              = rdiff::rvar<double, N>;
    double x                = 2.1;
    rvar   x_ad             = x;

    auto g = boost::math::tgamma(x_ad + x_ad);
    auto j = boost::math::cyl_bessel_j(0.25, x_ad + x_ad / 2);

    std::cout << "tgamma(x + x) = " << g << "\n";
    std::cout << "J_0.25(x) = " << j << "\n";

    auto &tape = rdiff::get_active_tape<double, 1>();

    g.backward();
    std::cout << "d/dx tgamma(x+x), autodiff = " << x_ad.adjoint()
              << " expected = " << 2 * boost::math::tgamma(2 * x) * boost::math::digamma(2 * x)
              << std::endl;

    tape.zero_grad();
    j.backward();

    std::cout << "d/dx J_0.25(x+x/2), autodiff = " << x_ad.adjoint() << " expected = "
              << 3.0 / 4.0
                     * (boost::math::cyl_bessel_j(-0.75, 3. * x / 2.)
                        - boost::math::cyl_bessel_j(1.25, 3. * x / 2.))
              << std::endl;

    return 0;
}

produces the expected gradients. tgamma on integer values is still problematic however. I'll submit a different pr with this change after the docs are approved.

@ckormanyos
Copy link
Member

ckormanyos commented Aug 25, 2025

@ckormanyos Some good news on the expression template on/off macro. I have a working version under the add_no_expression_template_support branch.

Wow this seems very positive. I've also followed your discussion with John. I am of the opinion that, maybe down the road, we will get most or at least some of specfun working with both ET on as well as off.

With the change

BOOST_MATH_ET_OFF

Hmmm I was wondering if you would like to use a PP-definition that has one more word in it. Maybe someday, there will be several kinds of ETs in Math. What do you think about BOOST_MATH_DIFFERENTIATION_ET_OFF or ...ON?

@demroz
Copy link
Contributor Author

demroz commented Aug 25, 2025

@ckormanyos Some good news on the expression template on/off macro. I have a working version under the add_no_expression_template_support branch.

Wow this seems very positive. I've also followed your discussion with John. I am of the opinion that, maybe down the road, we will get most or at least some of specfun working with both ET on as well as off.

I think that the promote_args is key for this to work. I tried something like that earlier this morning, and it seemed to fix some things but the compiler still complains in various places

With the change

BOOST_MATH_ET_OFF

Hmmm I was wondering if you would like to use a PP-definition that has one more word in it. Maybe someday, there will be several kinds of ETs in Math. What do you think about BOOST_MATH_DIFFERENTIATION_ET_OFF or ...ON?

I'm fine with that. Would BOOST_MATH_REVERSE_MODE_AD_ET_OFF be better? or BOOST_MATH_REVERSE_MODE_AD_USE_ET that defaults to 1?

@ckormanyos
Copy link
Member

ckormanyos commented Aug 25, 2025

I'm fine with that. Would BOOST_MATH_REVERSE_MODE_AD_ET_OFF be better? or BOOST_MATH_REVERSE_MODE_AD_USE_ET that defaults to 1?

That's a tough question. In Multiprecision, we use a templated enumeration type. In other words, one of the template parameters to the number type is an enumeration value that is either et_off or et_on. We let our clients decide (when defining the type) if they want ET on or OFF. And the clients can even mix code with types having ET on and/or off.

Do you think it might be better to use a template parameter? It could default to either ET on or off, but that way, things might be more flexible or similar to the Multiprecision way?

In multiprecision, we default to et_on I believe.

@demroz
Copy link
Contributor Author

demroz commented Aug 25, 2025

I'm fine with that. Would BOOST_MATH_REVERSE_MODE_AD_ET_OFF be better? or BOOST_MATH_REVERSE_MODE_AD_USE_ET that defaults to 1?

That's a tough question. In Multiprecision, we use a templated enumeration type. In other words, one of the template parameters to the number type is an enumeration value that is either et_off or et_on. We let our clients decide (when defining the type) if they want ET on or OFF. And the clients can even mix code with types having ET on and/or off.

Do you think it might be better to use a template parameter? It could default to either ET on or off, but that way, things might be more flexible or similar to the Multiprecision way?

In multiprecision, we default to et_on I believe.

This would require a significant restructure of the code I think. Right now the et on and et off simply decide which header file with function overloads to include.

@ckormanyos
Copy link
Member

This would require a significant restructure of the code I think. Right now the et on and et off simply decide which header file with function overloads to include.

That'll work. If you ever decide to change down the road, you can probably still remain compatible by using a default parameterr value. So the simpler evolutionary step you suggest seems just fine for the moment.

BOOST_MATH_REVERSE_MODE_AD_ET_OFF be better? or BOOST_MATH_REVERSE_MODE_AD_USE_ET

These seem fine from above, if you're happy with them, I'm happy with them.

@demroz
Copy link
Contributor Author

demroz commented Aug 25, 2025

@mborland updated the docs based on your suggestions

@demroz
Copy link
Contributor Author

demroz commented Aug 25, 2025

Is there anything else needed from my end to get this merged?

@ckormanyos
Copy link
Member

Is there anything else needed from my end to get this merged?

I don't know why it was still necessary to approve your workflow runs (CI/CD on GHA). But I just did that, after overlooking the button for a week or so. I was sure we had approved them initially, but I guess not.

Let's see how it runs and shakes out on GHA. Other than that, from my side (Christopher), it's good to go.

So it might be a good idea to see how the Code Coverage looks. And if we don't like it, we can add tests either prior to or any time after the merge.

Cc: @jzmaddock and @mborland

Copy link

codecov bot commented Aug 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.10%. Comparing base (7535ee1) to head (2c943fb).
⚠️ Report is 79 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1302      +/-   ##
===========================================
+ Coverage    90.47%   95.10%   +4.62%     
===========================================
  Files          191      796     +605     
  Lines        23771    67115   +43344     
===========================================
+ Hits         21507    63830   +42323     
- Misses        2264     3285    +1021     
Files with missing lines Coverage Δ
include/boost/math/special_functions/sin_pi.hpp 100.00% <ø> (+17.39%) ⬆️

... and 667 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7535ee1...2c943fb. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mborland
Copy link
Member

Other than the main CMakeLists.txt inadvertently being deleted I am good with this PR assuming it runs green.

@demroz
Copy link
Contributor Author

demroz commented Aug 25, 2025

Other than the main CMakeLists.txt inadvertently being deleted I am good with this PR assuming it runs green.

strange. I added it back in

@demroz
Copy link
Contributor Author

demroz commented Aug 26, 2025

Does it still make sense for the ET on / ET off change to be a separate PR? Its something that I have working. I guess it would be good to compartmentalize the discussion.

@mborland
Copy link
Member

Does it still make sense for the ET on / ET off change to be a separate PR? It's something that I have working. I guess it would be good to compartmentalize the discussion.

I think so. I'll merge this in and then you can do more targeted feature branches.

@mborland mborland merged commit 92076c6 into boostorg:develop Aug 26, 2025
63 checks passed
@ckormanyos
Copy link
Member

I'll merge this in and then you can do more targeted feature branches.

Thanks @demroz nice work. I'm excited to see how this area of Math evolves.

Thanks also Matt and John.

@ckormanyos
Copy link
Member

Hi @demroz is it appropriate or useful to add a phrase or extension of this sentence in the main docs?

@demroz
Copy link
Contributor Author

demroz commented Aug 27, 2025

I'm sorry I dont understand what you mean exactly. Are you asking if its appropriate to add a sentence about cost function minimization to the autodiff docs? Or adding a setance about autodiff to the readme file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants