Skip to content

Conversation

@Manangka
Copy link
Contributor

@Manangka Manangka commented Aug 1, 2025

This PR adds several optimizations to improve performance of the UTVD algorithm.

After comparing the new UTVD scheme with the existing TVD scheme it turned out that the performance of the UTVD can be twice as bad as the TVD scheme. After profiling the code using gprof and perf it turned out that the exact same computations of several methods are happening multiple times Those methods are.

  • gradients
  • local extrema
  • node distances

They are computed each time when the flux through a connection is computed. However this isn't nesscary. For instance the node_distance only need to be computed once during the initialization phase and then cached.

For the gradient and local extrema the computation need to happen once every outer iteration. And it can be done in one go for all cells, instead for evaluating it for every connection.

For this an invalidate method has been added to the IInterpolationSchemeInterface which is called every time the adv_fc is called. On invalidation the gradients and local extrema are recomputed.

The performance has been tested using the (gwt-adv-schemes) using a fixed timestep such that all models have the same amount of total time steps

Structured

Block Step Sin2
TVD 8170.00 ms 8527.58 ms 9261.64 ms
UTVD 21615.25 ms 21376.20 ms 20762.04 ms
UTVD-Optimized 10197.65 ms 10571.01 ms 10452.84 ms

Triangle

Block Step Sin2
TVD 12550.05 ms 12802.96 ms 10638.20 ms
UTVD 21527.24 ms 19526.76 ms 13936.71 ms
UTVD-Optimized 13015.30 ms 13352.33 ms 11749.55 ms

Voronoi

Block Step Sin2
TVD 28370.61 ms 30232.90 ms 27591.43 ms
UTVD 37455.10 ms 40452.87 ms 34365.99 ms
UTVD-Optimized 21431.70 ms 20462.69 ms 19549.89 ms

Checklist of items for pull request

  • Replaced section above with description of pull request
  • Closed issue #xxxx
  • Referenced issue or pull request #xxxx
  • Added new test or modified an existing test
  • Ran ruff on new and modified python scripts in .doc, autotests, doc, distribution, pymake, and utils subdirectories.
  • Formatted new and modified Fortran source files with fprettify
  • Added doxygen comments to new and modified procedures
  • Updated meson files, makefiles, and Visual Studio project files for new source files
  • Updated definition files
  • Updated develop.toml with a plain-language description of the bug fix, change, feature; required for changes that may affect users
  • Updated input and output guide
  • Removed checklist items not relevant to this pull request

For additional information see instructions for contributing and instructions for developing.

Manangka added 19 commits June 23, 2025 12:46
# Conflicts:
#	make/makefile
#	msvs/mf6core.vfproj
#	src/Model/TransportModel/InterpolationScheme/AdvSchemeEnum.f90
#	src/Model/TransportModel/InterpolationScheme/CentralDifferenceScheme.f90
#	src/Model/TransportModel/InterpolationScheme/TVDScheme.f90
#	src/Model/TransportModel/tsp-adv.f90
#	src/Utilities/LinearAlgebraUtils.f90
#	src/meson.build
# Conflicts:
#	make/makefile
#	src/Model/TransportModel/tsp-adv.f90
# Conflicts:
#	src/Model/TransportModel/InterpolationScheme/UTVDScheme.f90
@Manangka Manangka marked this pull request as ready for review August 11, 2025 11:53
Copy link
Contributor

@aprovost-usgs aprovost-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some simplification of the new code is possible

type(CoefficientsType) :: coefficients

! Calculate internal domain fluxes and add to matrix_sln and rhs.
call this%face_interpolation%invalidate()
Copy link
Contributor

@aprovost-usgs aprovost-usgs Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was introduced as part of a strategy to calculate the gradients and extrema for all the cells in one go, rather than every time face_interpolation%compute is called for a cell-cell connection. This is currently accomplished by (1) invalidating the cache before entering the connections loop, (2) on the first call to face_interpolation%compute, calculating and caching all the gradients and extrema and revalidating the cache so that (3) on subsequent calls to face_interpolation%compute the cached gradients and extrema can be used (not recalculated).

While there's nothing wrong with this logic, it might be more straightforward to move the calculation of the gradients and extrema out of face_interpolation%compute, i.e., simply call face_interpolation%compute_gradients and face_interpolation%compute_local_extrema (here, in adv_fc) before entering the connections loop. That would eliminate the need for face_interpolation%invalidate and the cache_valid flag and would separate the cell-by-cell calculations from the connection-by-connection calculations. To underscore its more specific role, face_interpolation%compute could be renamed face_interpolation%compute_coefficients.

If you keep the current strategy based on the cache_valid flag, consider either introducing a face_interpolation%validate routine to complement the invalidate routine, or (my preference) eliminating the invalidate routine and setting this%cache_valid=.false. directly in adv_fc and adv_cq, similarly to how you set this%cache_valid=.true. directly in the compute routine.

! rate and has dimensions of L^/T.
nodes = this%dis%nodes

call this%face_interpolation%invalidate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment in adv_fc above. The same approach would be used here.

real(DP), intent(in), dimension(:), pointer :: phi

this%phi => phi
call this%gradient%set_field(phi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTVD is currently the only one of the four interpolation schemes that requires a gradient calculation. As this calculation is expensive, performance has been improved by doing it only when necessary and using cached gradient values otherwise. Recalculation of the gradient must be done before entering the connections loop in adv_fc and adv_cq. Once inside the loop, cached values can be used.

When UTVD is used, recalculation of the gradient is performed during a call (from adv_fc or adv_cq) to the set_field procedure of the UTVD scheme, which in turn calls the set_field procedure of the gradient object. In the case of UTVD, the gradient object is wrapped in a decorator (CachedGradientType) that adds a caching capability. The set_field procedure of the "decorated" gradient object includes a recalculation of all the cell gradient values, which it caches in an array. The get procedure of the decorated gradient object, which is used in UTVD's compute procedure, then simply extracts the required values from the cache.

If the decorator is anticipated to be useful for future coding applications in MF6, this could be an elegant way to provide a caching capability. If not, might it be preferable in the current application to forgo the decorator and have the cached_gradients array (like the cached_node_distance array) reside in UTVD, which could do the recalculation and caching its set_field procedure? (And UTVD's compute procedure would directly draw values from the cache.) In the end, this should be equivalent to what's happening now, but without the decorator as an intermediary. I realize you might have considered this and decided to go with the decorator because of its potential applicability in other contexts.

If the decorator were to be eliminated, could the gradient object reside only in UTVD and be created in that type's constuctor, rather than in the TspAdv constuctor? UTVD would be the only module that needs it -- TspAdv currently needs it only to create it as a LeastSquaresGradient, move it to a CachedGradient, and pass it to UTVD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aprovost-usgs I considered both cases you mentioned. I first thought about moving the gradient creation and use entirely into the UTVD class. As you said it is only used there so it makes sense it would reside there.

However i think there is much in potential in having the gradient available in other places/packages. Having it in the tsp-adv file makes it easier for others to find and use.

There are 2 use cases in have in mind for which the gradient is needed:

  • Determining the gradient at the cell boundary. In the paper of Darwish there is an equation in which he determines the gradient at a cell face by use cell gradients of the connected cells
  • Implementing other limiters like the Barth-Jespersen limiter. They make use of gradients as well

@Manangka Manangka merged commit 9959bfd into MODFLOW-ORG:develop Sep 1, 2025
20 checks passed
@wpbonelli wpbonelli added this to the 6.7.0 milestone Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants