cl-waffe2/optimizers

[class] AbstractOptimizer

AbstractTensors with :requires-grad=t can find their gradients with the (backward (build toplevel)) function. AbstractOptimizer is a class which minimizes the value of toplevel subject to (grad tensor). In cl-waffe2, we initialize one AbstractOptimizer for one AbstractTensor. Specifically, one is able to create a new AbstractOptimizer with the function (name tensor &rest constructor-args), for example, (adam (parameter (randn (list 3 3))) :lr 1e-3) to create a new Adam Optimizer, and can be tied to the tensor like: (hook-optimizer! tensor abstract-optimizer). Users can define any optimizer algorithms with the defoptimizer macro. Optimizing tied tensors is performed by calling a (step-optimize optimizer) method. The parameter to be optimized can be accessed by a read-parameter method.

Example: Hooks and calls the optimizer tied to the tensor.

(let ((a (parameter (randn `(3 3)))))
    (hook-optimizer! a (Adam a))
    (call-optimizer! a))

Tips: Customized Printing

At first, AbstractOptimizers are displayed in your terminal like:

(Adam (parameter (randn `(3 3))))
;; <AbstractOptimizer: ADAM( ) -> TID11256>
;;                          ^ You're allowed to insert something

The method cl-waffe2/vm.nodes:on-print-object is also used to customize how AbstractOptimizer is displayed:

(defmethod cl-waffe2/vm.nodes:on-print-object ((opt Adam) stream)
  (format stream "lr=~a eps=~a beta1=~a beta2=~a N=~a"
      (lr-of opt)
      (eps-of opt)
      (beta1-of opt)
      (beta2-of opt)
      (adam-n opt)))

Do not insert Newline here because AbstractOptimizer is also displayed when printing AbstractTensor with hooked optimizers.

See also: defoptimizer read-parameter step-optimize.

[macro] defoptimizer

The macro defoptimizer defines a user-defined optimizer class which is a subclass of AbstractOptimizer. And the class is dispatched one per parameter to be optimized and the method step-optimize is called each time an optimizing is performed.

Input

param the tensor to be optimized is given as this argument. the tensor is stored in the param slot automatically, being accessed by a read-parameter method.

Example

We use defmodel and defmodel-as because formulae for optimisation functions can be expressed in Composite and compiled as functions to reduce compilation time.

(defoptimizer (SGD (self param &key (lr 1e-3))
           :slots ((lr :initarg :lr :reader sgd-lr))))

(defmodel (SGD-Compute-Form (self)
       :where (Param[~] Grad[~] Lr[scal] -> Param[~] where scal = 1)
       :documentation "Param_New <- Param - Param * Grad * Lr"
       :on-call-> ((self param grad lr)
               (declare (ignore self))
               (A-=B param (!mul lr grad)))))

(defmodel-as (SGD-Compute-Form) :named step-sgd)

(defmethod step-optimize ((optimizer SGD))
  (let* ((lr    (make-tensor (sgd-lr optimizer)))
     (param (read-parameter optimizer))
     (grad  (grad param)))    
    (step-sgd param grad lr)))

[optimizer] SGD

Initializer

(SGD param &KEY (LR 0.001))

Description

Inputs

Implements a simple SGD.

ParamnewParamParamgrad×lr Param_{new}\gets{Param - Param_{grad}\times{lr}}

lr[single-float] learning rate.

[optimizer] ADAM

Initializer

(ADAM param &KEY (LR 0.001) (EPS 1.0e-7) (BETA1 0.9) (BETA2 0.999))

Description

Inputs

Implements Adam algorithm.

See the original paper for detailed algorithms.