cl-waffe

Tutorials

Introducing WaffeTensor

Most deep learning frameworks, represented by PyTorch's Tensor and Chainer's Variables, has their own data structures to store matrices. In cl-waffe, WaffeTensor is available and defined by Common Lisp's defstruct.

⚠️ There is no guarantee that this design is technically mature.

What can WaffeTensor do?

Internally, All matrices created by cl-waffe is a type of mgl-mat, being accessed by the accessor (data tensor).

REPL:

CL-WAFFE> (setq x (!randn `(3 3))) ; WaffeTensor
#Const(((0.050... 1.007... 0.258...)        
                 ...
        (-0.39... 0.869... -0.55...)) :dtype :float :shape (3 3) :backward NIL)
CL-WAFFE> (data x) ;mgl-mat:mat
#<MAT 3x3 AB #2A((0.050437 1.0072675 0.25835297)
                 (1.703179 -0.53816134 0.09240111)
                 (-0.39267328 0.8698013 -0.55995613))>

In the same way, WaffeTensor can restore scalar object.

REPL:

CL-WAFFE> (setq x (const 1.0)) : WaffeTensor
#Const(1.0 :dtype SINGLE-FLOAT :backward NIL)
CL-WAFFE> (data x) ; single-float
1.0

That is, one of the main roles of WaffeTensor is to be a wrapper for multiple data structures.

You may well feel it is just rebundant for waffetensor to be only a wrapper. Of course, WaffeTensor has also these roles:

To Restore Computation Nodes

Operations performed via cl-waffe, creates a comutation nodes. This can all be extended by the defnode and call macros described the defnode and call section.

Input

CL-WAFFE>
(let ((a (const 1.0))
      (b (const 1.0)))
  (!add a b))
Output
#Const(2.0 :dtype SINGLE-FLOAT :backward <Node: ADDTENSOR{W893}>)

When gradient is not required (e.g.: predict), the macro (with-no-grad) would be useful.

with-no-grad(&body body)
Below this macro, the parameter *no-grad* become t, which means: some operations are forcibly ignored. (e.g.: save-for-backward, building computation nodes)
(with-no-grad
  (call (model) x))

Input

CL-WAFFE>
(with-no-grad
    (let ((a (const 1.0))
	  (b (const 1.0)))
      (!add a b)))
Output
#Const(2.0 :dtype SINGLE-FLOAT :backward NIL)

To Restore Gradients

WaffeTensors which created by (parameter tensor) macro, posses the gradients, where you can get via `(backward out)`

parameter(tensor)

Redefining new-tensor where old-tensor is const or tensor.

The new-tensor can made grads.

Excepted usage is like:

(setq my-param (parameter (!mul 0.01 (!randn `(10 10)))))

Note that: tensor's computation node that old-tensor has, will be lost. Only tensor's data and backend will be extended.

Input
Tensor (as usual, defined by (const)(sysconst)(tensor))
Output
Tensor (as usual, defined by (tensor))

backward(tensor)

Compute back propagation by traversing the Tensor's computation node.

The parameters of the model defined by (tensor) or to which (Parameter tensor) is applied, store the gradient in grad slot.

Note that: tensor must be the shape of `(1) or single value. Otherwise an error occurs.

In the process calculating backward, new backwards won't be created. (*no-grad* automatically becomes t)

Input
WaffeTensor
Output
NIL

REPL:

CL-WAFFE> (setq a (parameter (!randn `(3 3))))
#Parameter{((-1.07... -1.93... -0.07...)            
                         ...
            (1.353... 0.451... 2.473...)) :dtype :float :shape (3 3) :backward NIL}
CL-WAFFE> (setq b (parameter (!randn `(3 3))))
#Parameter{((0.234... 0.449... -1.02...)            
                         ...
            (-0.42... -1.63... -0.34...)) :dtype :float :shape (3 3) :backward NIL}
CL-WAFFE> (setq c (parameter (!randn `(3 3))))
#Parameter{((0.157... 1.040... -0.84...)            
                         ...
            (1.850... -0.26... -0.24...)) :dtype :float :shape (3 3) :backward NIL}
CL-WAFFE> (setq z (!sum (!add (!mul a b) c))) ; computes z=a*b + c, and summarize it.
#Const(-0.5249139 :dtype SINGLE-FLOAT :backward <Node: SUMUPTENSOR{W903}>)
CL-WAFFE> (backward z)
NIL
CL-WAFFE> (grad a)
#<MAT 3x3 B #2A((0.026024515 0.04989684 -0.11357514)
                (-0.07813747 -0.032786068 -0.11216043)
                (-0.047159225 -0.18221794 -0.038357873))>
CL-WAFFE> (grad b)
#<MAT 3x3 B #2A((-0.11956648 -0.21451499 -0.008029957)
                (0.14240001 0.11439725 0.002615907)
                (0.15042241 0.050139852 0.27483448))>
CL-WAFFE> (grad c)
#<MAT 3x3 BF #2A((0.11111111 0.11111111 0.11111111)
                 (0.11111111 0.11111111 0.11111111)
                 (0.11111111 0.11111111 0.11111111))>

with-verbose(&body body)
In the codes below, the computation nodes will be displayed when (backward out)

(backward out) called inside of (with-verbose &body body) macro, will display how the computation nodes are traced. It would be helpful for debugging.

To distinguish What Tensor Requires Gradients

WaffeTensor that requires gradients, are represented by (parameter tensor), on the other hand, don't requires one are (const). Then, Computational nodes that have no parameters at the destination of back propagation do not need to keep a copy for gradient creation during forward propagation or to perform back propagation in the first place. WaffeTensor determines this dynamically during forward propagation.

To Store Lazy-Evaluated Object

You may notice that: some operators, like !transpose, creates lazy-evaluated tensor when get started with cl-waffe.

REPL:

CL-WAFFE> (!transpose (!randn `(3 1)))
#Const(<Transposed Tensor> :shape (1 3) :backward <Node: TRANSPOSETENSOR{W906}>)

They behaves as if they're normal tensor (In fact, !shape !dims etc... works as usual), but aren't evaluated until (value tensor) is called.

REPL:

CL-WAFFE> (setq transpose (!transpose (!randn `(3 1))))
#Const(<Transposed Tensor> :shape (1 3) :backward <Node: TRANSPOSETENSOR{W907}>)
CL-WAFFE> (value transpose)
#<MAT 1x3 B #2A((-2.362661 -1.4510747 -0.88706297))>
CL-WAFFE> transpose
#Const(((-2.36... -1.45... -0.88...)) :dtype :float :shape (1 3) :backward <Node: TRANSPOSETENSOR{W907}>)

This property helps to reduce the cost of !transpose before !matmul

Parameter and Const

There are two types of WaffeTensor, parameter and constant. The parameter creates gradient when (backward out) is called, on the other hand, the constant doesn't.

Initialize Constants

cl-waffe provides various ways to initialize constants. For example, `!randn` initializes the new tensor of the given dims with sampling the standard distribution, where var=0.0, stdev=1.0. !beta samples the beta distribution with the given alpha and beta.

REPL:

CL-WAFFE> (!randn `(10 10))
#Const(((-1.20... 0.160... ~ -0.68... 1.776...)        
                 ...
        (0.137... 0.582... ~ 1.254... 0.590...)) :dtype :float :shape (10 10) :backward NIL)
CL-WAFFE> (!beta `(10 10) 2.0 1.0)
#Const(((0.787... 0.993... ~ 0.601... 0.962...)        
                 ...
        (0.980... 0.505... ~ 0.553... 0.657...)) :dtype :float :shape (10 10) :backward NIL)

WaffeTensors we obtain from standard initializing methods are Constant. In general, cl-waffe provides the constructor (const value). The given value is coerced to properly types. In this example, we obtain mgl-mat from simple-array.

REPL:

CL-WAFFE> (const (make-array `(3 3)))
#Const(((0.0 0.0 0.0)        
                 ...
        (0.0 0.0 0.0)) :dtype :float :shape (3 3) :backward NIL)

Initialize Parameter

Parameters are initialized via the macro (parameter tensor), which makes the given tensor parameter.

REPL:

CL-WAFFE> (parameter (!randn `(10 10)))
#Parameter{((-0.41... 0.890... ~ 1.851... -0.73...)            
                         ...
            (-1.29... -1.27... ~ -1.20... -2.28...)) :dtype :float :shape (10 10) :backward NIL}

Parameter vs Constant

Excepted Usage of them is:

Constant
Datasets, the temporary result of calculations, Parameter which is not necessary to be optimized.
Parameter
Trainable Variables, to be optimized by optimizers defined by defoptimizer.

defnode and call

defnode(name initializer-arguments &key parameters (disassemble-forward nil) forward-declaim forward (disassemble-backward nil) backward-declaim backward (document An node, defined by cl-waffe.))

Defines computation nodes in a format that cl-waffe can handle.

Note: the data structures that can be used in arguments, and returned values, must be following:

  1. WaffeTensor
  2. 1D list which each element is WaffeTensor

Be aware that you can't use (values x y ...).

name
The node's name. constructor and structure are being defined named after this argument.
initializer-argument
arguments the constructor have.
parameter
The parameters this node has being initializer with initializer-argument.
disassemble-forward
when t, when this node is compiled, display the disassemble of forward slot.
forward-declaim
Describe the declaim for the forward function. Note that the first argument is a structure. and :forward keyword in this declaim will be replaced by the forward function's name.
forward
the definition of forward
disassemble-backward
when t, when this node is compiled, display the disassemble of backward slot.
backward-declaim
Describe the declaim for the backward function. Note that the first argument is a structure. and :backward keyword in this declaim will be replaced by the backward function's name.
backward
the definition of backward

The macros defnode and call server as a key component of cl-waffe. In designing deep learning models, incorporating object-oriented programming can lead to more consice descriptions. Although Common Lisp has a powerful framework: CLOS and Closer-MOP, but I think its computational speed strongly depends on what common lisp implementation to use. (e.g.: SBCL/Clozure CL...) Thus, by using only defstruct and defun for defining the computation nodes and wrapping them with macros, (defnode) and (call), I have reduced the overhead associated with the process. This example shows how to define ScalarAdd Node.

Input

CL-WAFFE>
(defnode ScalarAdd ()
  :disassemble-forward t
  :forward-declaim (declaim (ftype (function (ScalarAdd waffetensor waffetensor) waffetensor) :forward))
  :forward ((x y)
	    (let ((x (data x))
		  (y (data y)))
	      (declare (type single-float x y))
	      (const (+ x y))))
  :disassemble-backward t
  :backward-declaim (declaim (type (function (ScalarAdd waffetensor) list) :backward))
  :backward ((dy)(list dy dy)))
Output
NIL

Through this macro, these structures and functions are defined:

  1. The structure, ScalarAdd
  2. The constructor function, (ScalarAdd)
  3. The function, (call-scalaradd-forward-mgl self x y) where self is a strucure ScalarAdd
  4. The function, (call-scalaradd-backward-mgl self dy) where self is a structure ScalarAdd.

Setting :disassemble-forward or :disassemble-backward t, prints the disassemble of :forward/:backward (only essential parts) respectively. From the result below, it seems to be optimized enough...

; disassembly for #:|nodedebug9718|
; Size: 148 bytes. Origin: #x540A110F                         ; #:|nodedebug9718|
; 0F:       498B4510         MOV RAX, [R13+16]                ; thread.binding-stack-pointer
; 13:       488945F8         MOV [RBP-8], RAX
; 17:       4883EC10         SUB RSP, 16
; 1B:       488B55F0         MOV RDX, [RBP-16]
; 1F:       B902000000       MOV ECX, 2
; 24:       48892C24         MOV [RSP], RBP
; 28:       488BEC           MOV RBP, RSP
; 2B:       B802AC3650       MOV EAX, #x5036AC02              ; #<FDEFN DATA>
; 30:       FFD0             CALL RAX
; 32:       480F42E3         CMOVB RSP, RBX
; 36:       4C8BC2           MOV R8, RDX
; 39:       4C8945E0         MOV [RBP-32], R8
; 3D:       4883EC10         SUB RSP, 16
; 41:       488B55E8         MOV RDX, [RBP-24]
; 45:       B902000000       MOV ECX, 2
; 4A:       48892C24         MOV [RSP], RBP
; 4E:       488BEC           MOV RBP, RSP
; 51:       B802AC3650       MOV EAX, #x5036AC02              ; #<FDEFN DATA>
; 56:       FFD0             CALL RAX
; 58:       480F42E3         CMOVB RSP, RBX
; 5C:       4C8B45E0         MOV R8, [RBP-32]
; 60:       4180F819         CMP R8B, 25
; 64:       7538             JNE L1
; 66:       66490F6ED0       MOVQ XMM2, R8
; 6B:       0FC6D2FD         SHUFPS XMM2, XMM2, #4r3331
; 6F:       80FA19           CMP DL, 25
; 72:       7403             JEQ L0
; 74:       CC51             INT3 81                          ; OBJECT-NOT-SINGLE-FLOAT-ERROR
; 76:       08               BYTE #X08                        ; RDX(d)
; 77: L0:   66480F6ECA       MOVQ XMM1, RDX
; 7C:       0FC6C9FD         SHUFPS XMM1, XMM1, #4r3331
; 80:       F30F58CA         ADDSS XMM1, XMM2
; 84:       660F7ECA         MOVD EDX, XMM1
; 88:       48C1E220         SHL RDX, 32
; 8C:       80CA19           OR DL, 25
; 8F:       B902000000       MOV ECX, 2
; 94:       FF7508           PUSH QWORD PTR [RBP+8]
; 97:       B802DD3650       MOV EAX, #x5036DD02              ; #<FDEFN CONST>
; 9C:       FFE0             JMP RAX
; 9E: L1:   CC51             INT3 81                          ; OBJECT-NOT-SINGLE-FLOAT-ERROR
; A0:       20               BYTE #X20                        ; R8(d)
; A1:       CC10             INT3 16                          ; Invalid argument count trap

; disassembly for #:|nodedebug9739|
; Size: 84 bytes. Origin: #x541BA04C                          ; #:|nodedebug9739|
; 4C:       498B4510         MOV RAX, [R13+16]                ; thread.binding-stack-pointer
; 50:       488945F8         MOV [RBP-8], RAX
; 54:       4D896D28         MOV [R13+40], R13                ; thread.pseudo-atomic-bits
; 58:       498B5558         MOV RDX, [R13+88]                ; thread.cons-tlab
; 5C:       488D4220         LEA RAX, [RDX+32]
; 60:       493B4560         CMP RAX, [R13+96]
; 64:       772E             JNBE L2
; 66:       49894558         MOV [R13+88], RAX                ; thread.cons-tlab
; 6A: L0:   48893A           MOV [RDX], RDI
; 6D:       48897A10         MOV [RDX+16], RDI
; 71:       48C7421817010050 MOV QWORD PTR [RDX+24], #x50000117  ; NIL
; 79:       488D4217         LEA RAX, [RDX+23]
; 7D:       48894208         MOV [RDX+8], RAX
; 81:       80CA07           OR DL, 7
; 84:       4D316D28         XOR [R13+40], R13                ; thread.pseudo-atomic-bits
; 88:       7402             JEQ L1
; 8A:       CC09             INT3 9                           ; pending interrupt trap
; 8C: L1:   488BE5           MOV RSP, RBP
; 8F:       F8               CLC
; 90:       5D               POP RBP
; 91:       C3               RET
; 92:       CC10             INT3 16                          ; Invalid argument count trap
; 94: L2:   6A20             PUSH 32
; 96:       FF142528050050   CALL [#x50000528]                ; #x52A005B0: LIST-ALLOC-TRAMP
; 9D:       5A               POP RDX
; 9E:       EBCA             JMP L0

call(model &rest inputs &aux (features (model-inlineable-p model)))
calls the given model's forward slot with inputs.

Nodes which defined by this macro, works as if CLOS class, and they can have :parameters. However, what makes defnode distinct from them is that:

REPL:

CL-WAFFE> (time (call (ScalarAdd)(const 1.0)(const 1.0)))
#Const(2.0 :dtype SINGLE-FLOAT :backward <Node: SCALARADD{W924}>)
CL-WAFFE> (time (+ 1.0 1.0))
2.0
Evaluation took:
  0.000 seconds of real time
  0.000005 seconds of total run time (0.000005 user, 0.000000 system)
  100.00% CPU
  11,084 processor cycles
  0 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000001 seconds of total run time (0.000000 user, 0.000001 system)
  100.00% CPU
  422 processor cycles
  0 bytes consed

Nodes called by the macro (call) are fully inlined, (like CL's inline-generic-function, static-dispatch). Considering ScalarAdd builds computation node in addition to summing up the arguments, these overheads are enough small. Here's how I achieve this behaviour:

REPL:

CL-WAFFE> (macroexpand `(call (ScalarAdd)(const 1.0)(const 1.0)))
(LOCALLY
 (DECLARE (OPTIMIZE (SPEED 3)(SAFETY 1))
          (INLINE call-scalaradd-forward-mgl))
 (call-scalaradd-forward-mgl (SCALARADD)(CONST 1.0)(CONST 1.0)))

The function call-forward-scalaradd-mgl seems to be inlined. This is because (call) can detect the type of node in the compile time. This leads one of the key propeties, easy to optimise. The functions via defnode and call are optimized like:

Input

CL-WAFFE>
(defun sadd (x y)
    (declare (optimize (speed 3)(safety 0))
             (type single-float x y))
        (call (ScalarAdd)(const x)(const y)))
Output
SADD
(disassemble #'sadd)

; disassembly for SADD
; Size: 943 bytes. Origin: #x541AFCAE                         ; SADD
; AFCAE:       488975F0         MOV [RBP-16], RSI
; AFCB2:       4883EC10         SUB RSP, 16
.
.
(Omitted)

We got a large disassembled codes which means: all processes including building computation nodes parts, are correctly inlined. Anyway, the optimization of sadd function is properly working!. Note that the case when the type of given nodes aren't determined in compile time, call behaviours the different from this.

Input

CL-WAFFE>
(let ((node (ScalarAdd)))
    (macroexpand `(call node (const 1.0)(const 1.0))))
Output
(LET* ((MODEL NODE)(INPUTS (LIST (CONST 1.0)(CONST 1.0))))
  (IF (TYPEP MODEL 'MODEL-LIST)
      (PROGN
       (SETQ MODEL (NTH (DATA (CAR INPUTS))(MODEL-LIST-MLIST MODEL)))
       (SETQ INPUTS (CDR INPUTS))
       (ASSERT (NOT (TYPEP MODEL 'MODEL-LIST)) NIL
               cl-waffe.call: Assertion failed because model-list can't posses model-list as a element.)))
  (LOCALLY
   (DECLARE (OPTIMIZE (SPEED 3))
            (MAYBE-INLINE CALL-INLINED-FORWARD))
   (APPLY #'CALL-INLINED-FORWARD MODEL INPUTS)))

The expanded equation was slightly more complicated. Anyway, the most important part is (APPLY #'CALL-INLINED-FORWARD MODEL INPUTS). In short, call-inlined-forward is like:

(defun call-inlined-forwrd (model &rest inputs)
    (typecase model
        (addtensor (call-addtensor-forward-mgl ...))
        (scalaradd (call-scalaradd-forward-mgl ...))
        (T ; ... If this is first trying, Redefine call-inline-forward and try again
        )))

It may be misleading but simultaneously the most simple example. Of course they're inlined. And call-inlined-forward are automatically redefined when:

  1. The new backend is defined.
  2. The node you specified doesn't match any nodes.

That is, No need to pay attention to when they are inlined.

Input

CL-WAFFE>(let ((node (ScalarAdd)))
    (time (call node (const 1.0)(const 1.0))))
Output
#Const(2.0 :dtype SINGLE-FLOAT :backward <Node: SCALARADD{W926}>)
Evaluation took:
  0.000 seconds of real time
  0.000005 seconds of total run time (0.000005 user, 0.000000 system)
  100.00% CPU
  10,502 processor cycles
  0 bytes consed

It works the same as the first example, the overhead is enough small. (P.S.: I was told that it is impossible for SBCL to optimize a CASE of several thousand lines. The assumption is that the more nodes defined in cl-waffe, the less performance we got. In my own benchmarks, I felt it was doing well enough on the second call, but if it is slow, I know how to make it faster.)

By the way, defnode's forward slot can require &rest arguments. However, (call) is a macro, so that we can't use apply. Is there no way to call it with &rest arguments? No, get-forward-caller and get-backward-caller is available to get the function object itself. In cl-waffe's implementation, !concatenate requires an &rest arguments.

get-forward-caller(model)
Returns the given node (model/node/optimizer)'s forward slot, which is callable with funcall/apply.

get-backward-caller(model)
Returns the given node (model/node/optimizer)'s backward slot, which is callable with funcall/apply.

(defun !concatenate (axis &rest tensors)
  (declare (optimize (speed 3))
	   (type fixnum axis))
  (let* ((node (ConcatenateTensorNode axis))
	 (caller (get-forward-caller node)))
    (apply caller node tensors)))

Writing Node Extensions

You may notice that the functions generated by defnode has the suffix, mgl. This indicates the backend cl-waffe uses. (mgl = mgl-mat).

If the existing implementation of nodes aren't suitable for your usage, replace them. and cl-waffe provides the ecosystem to manage these additional implementation, I call it backend. For example, you can replace my broadcasting implementation with another fast implementation method. Let's create a double-float version of AddScalar.

Input

CL-WAFFE>
(define-node-extension ScalarAdd
	     :backend :double-float
	     :forward-declaim (declaim (ftype (function (ScalarAdd waffetensor waffetensor) waffetensor) :forward))
	     :forward ((x y)
	    (let ((x (data x))
		  (y (data y)))
	      (declare (type double-float x y))
	      (const (+ x y))))
	     :backward-declaim (declaim (type (function (ScalarAdd waffetensor) list) :backward))
	     :backward ((dy)(list dy dy)))
Output
NIL

And receive this:

[INFO] Inlining call-forward... Total Features: 64
To disable this, set cl-waffe:*ignore-inlining-info* t

[INFO] Inlining call-backward... Total Features: 64
To disable this, set cl-waffe:*ignore-inlining-info* t

It's all done. The backends you defined can be switched via (with-backend backend-name &body body) macro. Let's check how call expands it.

with-backend(backend &body body)

Switches a backend.

See also: define-node-extension

REPL:

CL-WAFFE> 
(with-backend :double-float
    (macroexpand `(call (ScalarAdd)(const 1.0d0)(const 1.0d0))))
(LOCALLY
 (DECLARE (OPTIMIZE (SPEED 3)(SAFETY 1))
          (INLINE call-scalaradd-forward-double-float
           call-scalaradd-forward-mgl))
 (CASE *DEFAULT-BACKEND*
   (DOUBLE-FLOAT
    (call-scalaradd-forward-double-float (SCALARADD)(CONST 1.0d0)
                                         (CONST 1.0d0)))
   (MGL (call-scalaradd-forward-mgl (SCALARADD)(CONST 1.0d0)(CONST 1.0d0)))
   (T (call-scalaradd-forward-mgl (SCALARADD)(CONST 1.0d0)(CONST 1.0d0)))))

There's an additional case generated, depending on *default-backend*.

REPL:

CL-WAFFE> 
(with-backend :double-float
    (time (call (scalarAdd)(const 1.0d0)(const 1.0d0))))

#Const(2.0d0 :dtype DOUBLE-FLOAT :backward <Node: SCALARADD{W931}>)
Evaluation took:
  0.000 seconds of real time
  0.000005 seconds of total run time (0.000005 user, 0.000000 system)
  100.00% CPU
  9,814 processor cycles
  0 bytes consed

Adding new backends is no pain for cl-waffe!

MNIST Example

Using features that I introduced, we can training MLP Model with MNIST Dataset. In practice, more additional features are needed to put it simply: defmodel and deftrainer.

Defines your model

REPL:

CL-WAFFE> 
(defmodel MLP (activation)
  :parameters ((layer1   (cl-waffe.nn:denselayer (* 28 28) 512 T activation))
	       (layer2   (cl-waffe.nn:denselayer 512 256 T activation))
	       (layer3   (cl-waffe.nn:linearlayer 256 10 T)))
  :forward ((x)
	    (with-calling-layers x
	      (layer1 x)
 	      (layer2 x)
	      (layer3 x))))
NIL
CL-WAFFE> (MLP :relu)
<Model: MLP{W937}(
    <Model: LAYER1 -> DENSELAYER{W938} ...>
    <Model: LAYER2 -> DENSELAYER{W941} ...>
    <Model: LAYER3 -> LINEARLAYER{W944} ...>
)>
CL-WAFFE> (with-output-to-string (out)
    (print-model (MLP :relu) out))
––– <Model MLP{W945}>
––––––– <MLP's LAYER1 = DENSELAYER{W946}>
        |-ACTIVATION-|
        |___RELU_____|
––––––––––– <DENSELAYER's LAYER = LINEARLAYER{W947}>
            |––slot––|–––shape–––|–trainable–|
             WEIGHT -> (784 512)       O
              BIAS  ->  (1 512)        O
––––––– <MLP's LAYER2 = DENSELAYER{W949}>
        |-ACTIVATION-|
        |___RELU_____|
––––––––––– <DENSELAYER's LAYER = LINEARLAYER{W950}>
            |––slot––|–––shape–––|–trainable–|
             WEIGHT -> (512 256)       O
              BIAS  ->  (1 256)        O
––––––– <MLP's LAYER3 = LINEARLAYER{W952}>
        |––slot––|––shape–––|–trainable–|
         WEIGHT -> (256 10)       O
          BIAS  ->  (1 10)        O

 -(+) Total Param: 0

define your trainer

REPL:

CL-WAFFE> 
(deftrainer MLPTrainer (activation lr)
  :model          (MLP activation)
  :optimizer      cl-waffe.optimizers:Adam
  :optimizer-args (:lr lr)
  :step-model ((x y)
	       (zero-grad)
	       (let ((out (cl-waffe.nn:softmax-cross-entropy (call (model) x) y)))
		 (backward out)
		 (update)
		 out))
 :predict ((x)(call (model) x)))
NIL
CL-WAFFE> (setq trainer (MLPTrainer :relu 1e-3))
<Trainer: MLPTRAINER()>
CL-WAFFE> (slot-value trainer 'cl-waffe::optimizer)
<Optimizer: ADAM{W965}
    Param: #<GENERAL-HASH-TABLE :TEST EQL :COUNT 6 :WEAKNESS :VALUE {100EAEF273}>
    LR : 0.001
    Param: #<HASH-TABLE :TEST EQL :COUNT 0 {100EAEF363}>
    Param: #<HASH-TABLE :TEST EQL :COUNT 0 {100EAEF403}>
    N : 0
    EPSILON : 1.0e-7
    BETA1 : 0.9
    BETA2 : 0.999
    [Total Param]: 535818
>

(This section is still under progress. However, here's a MLP model which can achive 98% valid_accuracy.) fnn.lisp If you have cloned the cl-waffe's repository, Lakefile would be available:

$ lake example:install # Install training dataset
$ lake example:mnist # Start training. (batch-size=100)