-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Adding functional CompositeLayer #21099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* Fix functional dict inputs to support optional ones * Add unit test for optional dict inputs * Fix unit test formatting (cherry picked from commit 8333ef4)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21099 +/- ##
==========================================
+ Coverage 82.69% 82.72% +0.02%
==========================================
Files 564 565 +1
Lines 54132 54285 +153
Branches 8411 8438 +27
==========================================
+ Hits 44765 44907 +142
- Misses 7294 7308 +14
+ Partials 2073 2070 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
keras/src/models/cloning.py
Outdated
# A subclassed Functional model is always cloned | ||
# as a vanilla Functional model. | ||
new_model = Functional(cloned_inputs, cloned_outputs, | ||
name=model.name) | ||
if model.compiled: | ||
compiled_config = model.get_compile_config() | ||
new_model.compile_from_config(compiled_config) | ||
return new_model | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code was moved here from the end of _clone_functional_model when the function was renames _clone_function_object and repurposed for all functional types.
The test for functional_like_constructor(model.class) was removed. See implementation note 4.iii
keras/src/models/model.py
Outdated
# This test is permissive. Any argument combination that | ||
# could be a Functional init is allowed. This test will be | ||
# followed by an actual call of the Functional constructor | ||
# so the worst case is that args are not what they should | ||
# be and the constructor fails with an explicit error message. | ||
return ( | ||
(len(args) == 2) | ||
(len(args) >= 2) | ||
or (len(args) == 1 and "outputs" in kwargs) | ||
or ("inputs" in kwargs and "outputs" in kwargs) | ||
) | ||
|
||
def functional_like_constructor(cls): | ||
# This test is permissive. Any constructor that could be passed | ||
# inputs and outputs is accepted. This test triggers Functional | ||
# deserialization when whe know we have a functional config so | ||
# it's OK to try anything that could work. | ||
init_args = inspect.signature(cls.__init__).parameters | ||
funct_init_args = ( | ||
("inputs" in init_args and "outputs" in init_args) or | ||
("args" in init_args or "kwargs" in init_args)) | ||
return funct_init_args | ||
|
||
def strict_functional_like_constructor(cls): | ||
# This test is conservative. Only explcit "inputs" and "outputs" | ||
# arguments with those names, are accepted. This test triggers Functional | ||
# serialization and we want to do that in a subclass only when an explicitly | ||
# functional __init__(inputs, outputs) constructor exists in the subclass. | ||
init_args = inspect.signature(cls.__init__).parameters | ||
funct_init_args = ("inputs" in init_args and "outputs" in init_args) | ||
return funct_init_args | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup of the "is functional-like" logic. See implementation note 4.ii
def __init__(self, inputs, outputs, *args, param=1, **kwargs): | ||
super().__init__(inputs, outputs, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The carve-out for functional serialization of subclasses of Functional that have a functional-like constructor was constrained to constructors with explicit inputs and outputs arguments, which are the only ones we can test for. See implementation note 4.ii
# No way to detect that this can be serialized functionnally | ||
# since the graph could have been created inside the custom | ||
# __init__ with the same __init__ args. | ||
config = model.get_config() | ||
self.assertFalse(has_functional_config_keys(config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all other cases of subclassing Functional, functional serialization is NOT triggered since it is not possible to detect wether the layer graph is created outside of the class (in which case functional serialization would be useful) and when the graph is created inside of init in which case regular serialization that calls init at the end is enough.
Here is a demo Colab showing how a user can patch an CompositeLayer LLM to enable LoRA: |
I added SVF to the demo Colab. SVF = Singular Value fine-tuning - explanations here, it's an SVD-based variant of LoRA. It's 36 lines of code in total, 17 for the SVF layer, 7 to patch it into a pretrained model, 12 lines to patch it out, all done with user-level APIs. (and BTW, the code format failure is not me, it's because of an OpenVino installation warning) |
Hey @martin-gorner ! Probably someone else should review this in more detail, but have been doing some thinking on the UX (and only the UX so far). Still mulling, for now just some questions/thoughts... Is there a performance issue with using a I know that just using today's symbols won't cover all of what you are adding here--in particular building a functional from an unknown shape. And we should figure that out. But adding a Sequential as a sub-component to a larger model is already a pattern in a lot of Keras code. Are we saying that's bad with this PR or not really? Also is there a reason we are pushing sequential and functional style layer construction onto the same symbol? # A reusable composite layer
class MyCompositeLayer(CompositeLayer):
@staticmethod
def my_layer_fn(inputs):
x = layers.Dense(5)(inputs)
return layers.Dense(4)(x)
def __init__(self, **kwargs):
super().__init__(MyCompositeLayer.my_layer_fn, **kwargs) This feels off for a couple reasons. For one, recommending a And it's unclear how this would work for a reusable functional component with a lot of config. If you were writing a block layer with 5 - 10 arguments related to feature sizes, activations, residuals, etc. How would you pass that information to the functional building method? Lastly, I wonder if we'd want more predictability in method names for a subclasses of |
Hi Matt,
One downside is that a functional model must have pre-defined inputs, whereas a Keras Layer is allowed to have its input shape deferred until build() is called. This makes using layers much more practical. CompositeLayer retains this property. But apart from that there is nothing wrong in using a Model as composite layer today. Initially, it just did not feel very clean so I set out to try out what a proper CompositeLayer implementation would look like. In the process, I discovered that having a CompositeLayer could also clarify the internal class hierarchy. This is the class hierarchy I am suggesting down the line:
Note: this PR does not implement this class hierarchy. It does not change the This refactoring could unify Functional and Sequential in a seamless way. The artificial separation between these two classes is something power users building frameworks on top of keras have complained about in the past. There is also ugly code in a number of places (for ex cloning.py) that specifically tests for
Initially I just had
I don't have preconceived ideas about how to package CompositeLayers as a custom class. I'm currently trying to write a full LLM in this style to see what works best. Would you like to suggest something? |
In trying to write a full LLM functionally, so as to make LoRA and its modern variants easily implementable by the user, the biggest challenge right now is that it's not possible to have a layer that accepts optional inputs AND is defined in a functional way as a graph of layers (whether the container is Functional or CompositeLayer does not matter). This is different from a graph of layers that can contain custom layers with optional inputs which is what currently exists in unit tests. This kind of construct is not possible: x, cache = inputs
if cache is not None:
outputs = MyCacheLookupLayer()(x, cache)
else:
outputs = Dense(n)(x)
# then make a functional model/layer from (inputs, outputs) This means that the LLM must implement two separate backbones, one with caches, one without, which introduces complexity. Ideally, the LLM would have a single backbone implementation, that could be compiled with or without caches. I don't see how to do this yet, with an implementation that is purely functional all the way to the Dense layers, so that they can easily be swapped/wrapped for their LoRA-ified versions. Any ideas? |
Yeah this seems like a cool thing to solve. But you wouldn't necessarily need to solve it via a layer/model symbol split I think.
I think this has always been desired actually. If we could make Sequential be a subclass of Functional today we would, but IIRC there might have been some old but common uses of Sequential that made it hard to do that? But if we can make that change in a backwards compat friendly way I think this is a standalone piece of work we could grab any time.
Yeah agreed working with caches is a big pain point. I haven't even been worried about "functional all the way down," I think there's enough issues with
Ignoring sequential and whether we'd want a separate layer-not-model solution for now (I think they are largely orthogonal), here's an idea. class MLP(keras.Functional):
def __init__(self, inner_units):
super().__init__()
self.inner_units = inner_units
def functional_build(self, inputs):
features = inputs.shape[-1]
x = layers.Dense(self.inner_units, activation="relu")(inputs)
x = layers.Dense(self.features)(x)
return x + inputs # Residual.
def get_config(self):
config = super().get_config()
config.update("inner_units": self.inner_units)
return config
inputs = keras.Input(shape=(8,))
symbolic_outputs = MLP(32)(inputs)
real_outputs = MLP(32)(np.ones(16, 4))
Nice things -- this code looks really similar to current UX with subclassed layers. It also separates config from call graph in different methods, which makes it easier to subclass your subclass of functional (a real problem for KerasHub). It does not support writing "build" logic that is dependent on input shape without making a subclass, which is a key difference from your design I think. That's consistent with the |
Here you go, LLM, functional all the way down (colab): This is a working example. The backbone is entirely Functional which means that the layer graph is editable and the Colab shows two custom LoRA-like techniques. The KV caches and text generation works, as well as training. JIT compilation too. WDYT? Note: I did not think much about class-packaging the layers in this example. It's mostly just function calling. Your |
Still thinking and trying to organize my thoughts. In the meantime, I've been playing with many little prototypes to try and make this simpler from the end user side. Code is probably not complete, but that's not really the point. I'm trying to find a good UX. I'm not fully sure what I think of it yet, but food for thought. |
I managed to get rid of the double backbone, by using optional inputs: functional LLM all the way down with a single backbone. The trick was to put all the Thank you for your colab with UX ideas! - I'll send comments tomorrow, it's getting a bit late. |
Food for thought regarding functional layer packaging with config etc: I added backbone saving to the colab. It's done using plain So regarding your experiments, I wonder if packaging a functional layer with config attributes like I'm not saying I have a definitive recommendation though. People will want to give a class name to their backbone implementation etc. |
Adding some implementation notes and thoughts: The API that clones the layer graph, called
|
Agreed, this is bad. For KerasHub this is to the point of being something we would never use I think. There is a lot of stuff attached to model and layer class implementations today. A very non-exhaustive list -- stable diffusion generation subroutines that are not part of the graph, conv net pyramid output helpers, preset and lora saving/loading routines (conceptually, load parts of a save model), vae helpers, helpers to get certain layer subcomponents without needing the layer name/index. I'm sure some of this stuff could be pulled into the graph, and some could be gotten rid of. But IMO, at the end of the day, attaching functionality to classes is a very normal thing to do our developers will expect, and any workflow that strips classes back to base classes will be very limited.
This is the meat of the experiments I was running. Going through I'm of the option that having |
Introduction
CompositeLayer encapsulates a functional subgraph of layers. It is the equivalent of a Functional or Sequential model but as a lighter-weight component, without the model-specific functionality like .fit() etc.
Apart from offering a useful modeling component, one of the ultimate goals of this functionality is to allow programmatic edits on pre-trained models, like adding LoRA-like weights. The current implementation hard-codes LoRA deep inside the Dense layer. This design does not allow users to change the implementation, which is already significantly behind SOTA.
☆☆☆ New demo Colab ☆☆☆ (old colab here)
The colab shows how to build an LLM functional all the way down, which means that the layer graph is editable and the Colab shows two custom parameter-efficient fine-tuning techniques (LoRA and SVF). The KV caches and text generation works, as well as training. JIT compilation too.
The code for LoRA:
For SVF - Singular Value fine-tuning which is an SVD-based version of LoRA:
And all the code for doing this uses user-level APIs!
API
A CompositeLayer can be created either from a list of layers or a function that defines a graph of layers. There is no
constructor similar to a functional
Model(inputs, outputs)
because inputs and outputs are usually not known when creating a layer. They will be created when the layer is built.Implementation notes
Functional and CompositeLayer only depend on Function. There is no circular dependency between models and layers.
The current implementation an intermediate step to make reviewing (diffs) easier. It isolates 4 functions in functional.py that are used by both CompositeLayer and Functional:
compute_input_spec
run_through_graph_with_training_and_mask
function_from_config
serialize_functional_config
With this approach, no changes are made to the Functional Model class hierarchy.
The next step would be to move these 4 functions to CompositeLayer, then base Functional on CompositeLayer instead of Function. This will also allow the unification of Functional and Sequential models since both will be based on CompositeLayer and have a Function once build() is called. Code explicitly testing for Functional or Sequential can then be removed throughout the code base and replaced with
isinstance(obj, CompositeLayer) and obj.built
plot_model
andclone_model
functionality were adjusted to work with CompositeLayersTests were added for the main edge cases, namely subclasses of CompositeLayer and Functional with the layer graph instantiated inside or outside of the class, in various nesting scenarios, tested for serialization/deserialization and for processing through
clone_model
.Three bug fixes in
Functional
andclone_model
were needed for the tests to pass:Functional
. A list of one element was sometimes returned.Model.assert_input_compatible
let this through butFunction.assert_input_compatible
, which is stricter, did not. Single tensors are now returned in this case.Functional
which were buggy because, surprisingly,inspect.getfullargspec(Functional.__init__)
returns an empty list instead of the expected(inputs, outputs)
signature (see this colab).clone_model
subclasses of Functional are typically cloned as vanillaFunctional
. The same conservative behavior was adopted forCompositeLayer
. There was a carve-out however for subclasses of Functional with a "functional-like" constructor, again with a buggy test. This is a niche use case of the nicheclone_model
functionality so the carve-out was simply removed for simplicity.Passing a list of inputs to a model expecting a dictionary of inputs seems to be allowed, as long as flattening the dict does not result in reordering. There is an explicit reordering test in
functional._standardize_inputs (look for "sort")
. Changing this in Functional is not possible at this point but I would consider disallowing this in CompositeLayer. Tests covering this behavior arefunctional_test.test_list_input_with_dict_build
andcomposite_layer_test.test_list_input_with_dict_build
. (Point for discussion)In functional.py, serialization and deserialization functions
serialize_functional_config
andfunction_from_config
there is an explicit test for the typeFunctional
which triggers a node_index adjustment. This was left untouched and not updated for CompositeLayer as I have not been able to find a situation where this code is useful. A test that triggers this condition was added infunctions_test.py
but it passes whether the conditional clauses are there or not.Optional inputs are supported although no new API was added for them. They can be declared by setting a manual input spec on the CompositeLayer:
See
composite_layer_test.py:test_optional_inputs
for a working example.Point for discussion: is this user-friendly enough or is a new API required?