Approaches to hashing CUE values #3761
Replies: 6 comments
-
I need to put this through more thorough testing, but the following code is an initial approach that appears to work well enough. At its core is a conversion of a package cue_hasher
import (
"hash/fnv"
"strconv"
"cuelang.org/go/cue"
"github.com/mitchellh/hashstructure/v2"
)
var cueSyntaxOptions = []cue.Option{
cue.All(),
cue.Concrete(false),
cue.DisallowCycles(false),
cue.Docs(true),
cue.Hidden(true), // https://github.com/cue-lang/cue/issues/3771
cue.Raw(),
}
var hashStructureOptions = &hashstructure.HashOptions{
IgnoreZeroValue: false,
Hasher: fnv.New64a(), // not strictly needed -- default is fnv.New64()
}
func Hash(val cue.Value) (string, error) {
return HashString(val)
}
func HashString(val cue.Value) (string, error) {
valHash, err := HashUint(val)
if err != nil {
return "", err
}
return strconv.FormatUint(valHash, 16), nil
}
func HashUint(val cue.Value) (uint64, error) {
syn := val.Syntax(cueSyntaxOptions...)
valHash, err := hashstructure.Hash(syn, hashstructure.FormatV2, hashStructureOptions)
if err != nil {
return 0, err
}
return valHash, nil
} |
Beta Was this translation helpful? Give feedback.
-
You may need to think about canonicalization. i.e. equivalent structures that happen to be ordered differently. This will affect the hash, but the object is the same, somewhat defeating the point of using hashes. I think there has been discussion in this repo about this property under the name topological sorting (?), but I'm not sure what the status is. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the tip about topological sorting. I was able to find this discussion from a few years ago asking for something similar, and found the
|
Beta Was this translation helpful? Give feedback.
-
Thanks for raising this for discussion, @lorrrrrrrenzo and apologies for the delay in replying here after my initial response on Slack. One question that immediately comes to mind is, would you want your hash to be stable with respect to order of inputs? i.e. that:
should has to the same value as:
If so, then I think you want to normalise the result via something like sorting of identifiers, rather than toposort. Another question/thought: are the configurations you are talking about here concrete or incomplete? I'm assuming, given the discussion about cc @cuematthew for thoughts re toposort and sort fields. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response @myitcv. My interest is stability with respect to the resolved value, both concrete and incomplete. In the example you provided, I would like for the hash result to be the same in both cases: the order should be immaterial. How they are split between files in a given module should also be immaterial (e.g. a module defined in both I reached for When I looked at this weeks ago, effectively sorting the result of It might be helpful to share that I have in mind how Dhall handles hashing and how it is used within that ecosystem. From their language tour:
|
Beta Was this translation helpful? Give feedback.
-
I think I think what you actually want is a full lexicographical sort. For that, you need the CUE_DEBUG=sortfields env var. This will completely disregard all ordering information from CUE sources, and instead will sort fields of every struct lexicographically. Toposort, which is the default currently, will indeed be influenced by the order of fields in your source CUE, and as such, the output (and hence hash) will change cosmetically if your CUE source undergoes changes which are semantic-preserving but reorder fields. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am doing some work around lattice caching and record-keeping, and the ability to generate reproducible hash values for a given CUE value would go a long way towards easing both efforts. My initial research suggests the use of
hashstructure
, used by Terraform extensively, withcue.Value
orast.Expr
objects as a good initial approach.What are some considerations that I should keep in mind as I begin this investigation, especially concerning hash stability (acknowledging that CUE has not yet reached v1)? I also have hopes that such a function could be added to the CUE Go package in the future, but that's a future conversation.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions