EXPERIMENTAL Uncertainty loss #56
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
IMPORTANT: DO NOT MERGE into main branch!
If you want to use this code:
To enable the feature, set
model_uncertainty = trueinconfig-base.yaml.This is an experimental branch where I implemented the loss function from What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? (Kendall & Gal, 2017). After implementing, I tried training on a dev dataset, and found that the models trained with this method were all worse than just using a standard CNN. I struggled to implement this feature in a way that would not make the codebase significantly more complex. Due to certain limitations in the way Keras handles custom loss functions, the result is still fairly complex, and would not be easy for anyone to maintain for years down the line. Given the limited effectiveness of this method, and the added code complexity, I have decided not to merge this feature into the main branch. I strongly encourage future developers to avoid merging this feature without taking these considerations into account.
This feature was tested on a binary classification problem (SST open chromatin in mouse). I did not test it on a regression problem, which could yield more promising results.
Briefly, the idea is:
sigma) that represents the network's uncertainty about this output.sigmais large, the output of the network is allowed to differ widely from the target value, but whensigmais small, the network is penalized harshly for deviating from the target value.models.py):The significant code complexity comes from the choice to append
sigmaat the end of the output units, so that there is still only one output head for the network. For example, a regression model would have outputs(estimated y value, sigma), while a binary classification model would have(P class 0, P class 1, sigma). This choice is ad-hoc and a bit unnatural, and I do not want to set a precedent that we can just add output units to support another feature, which would get confusing if we want to use that other feature in combination with the uncertainty modeling.However, this was the only implementation that I could come up with inside of Keras. I attempted to add another output head of the model, which would output only sigma. But because of the way Keras loss functions are implemented, loss functions operate on each output head individually, so the math above would have become impossible to implement.