Suppose I want to write a custom optimizer class that conforms to the tf.keras
API (using TensorFlow version>=2.0). I am confused about the documented way to do this versus what's done in implementations.
The documentation for tf.keras.optimizers.Optimizer
states,
### Write a customized optimizer.
If you intend to create your own optimization algorithm, simply inherit from
this class and override the following methods:
- resource_apply_dense (update variable given gradient tensor is dense)
- resource_apply_sparse (update variable given gradient tensor is sparse)
- create_slots (if your optimizer algorithm requires additional variables)
However, the current tf.keras.optimizers.Optimizer
implementation does not define a resource_apply_dense
method, but it does define a private-looking _resource_apply_dense
method stub. Similarly, there are no resource_apply_sparse
or create_slots
methods, but there are a _resource_apply_sparse
method stub and a _create_slots
method call.
In official tf.keras.optimizers.Optimizer
subclasses (using tf.keras.optimizers.Adam
as an example), there are _resource_apply_dense
, _resource_apply_sparse
, and _create_slots
methods, and there are no such methods without the leading underscore.
There are similar leading-underscore methods in slightly-less-official tf.keras.optimizers.Optimizer
subclasses (e.g., tfa.optimizers.MovingAverage
from TensorFlow Addons: _resource_apply_dense
, _resource_apply_sparse
, _create_slots
).
Another confounding point for me is that some of the TensorFlow Addons optimizers also override the apply_gradients
method (e.g., tfa.optimizers.MovingAverage
), whereas the tf.keras.optimizers
optimizers do not.
Moreover, I noticed that the apply_gradients
method of tf.keras.optimizers.Optimizer
method calls _create_slots
, but the base tf.keras.optimizers.Optimizer
class does not have a _create_slots
method.
So, it seems that a _create_slots
method must be defined in an optimizer subclass if that subclass does not override apply_gradients
.
Questions
What is the correct way to subclass a tf.keras.optimizers.Optimizer
? Specifically,
- Does the
tf.keras.optimizers.Optimizer
documentation listed at the top simply mean to override the leading-underscore versions of the methods they mention (e.g.,_resource_apply_dense
instead ofresource_apply_dense
)? If so, are there any API guarantees about these private-looking methods not changing their behavior in future versions of TensorFlow? What are the signatures of these methods? - When would one override
apply_gradients
in addition to the_apply_resource_[dense|sparse]
methods?
Edit. Opened issue on GitHub: #36449