Tutorials
Introducing WaffeTensor
Most deep learning frameworks, represented by PyTorch's Tensor and Chainer's Variables, has their own data structures to store matrices. In cl-waffe, WaffeTensor is available and defined by Common Lisp's defstruct.
⚠️ There is no guarantee that this design is technically mature.
What can WaffeTensor do?
Internally, All matrices created by cl-waffe is a type of mgl-mat, being accessed by the accessor (data tensor).
REPL:
CL-WAFFE> (setq x (!randn `(3 3))) ; WaffeTensor
#Const(((0.050... 1.007... 0.258...)
...
(-0.39... 0.869... -0.55...)) :dtype :float :shape (3 3) :backward NIL)
CL-WAFFE> (data x) ;mgl-mat:mat
#<MAT 3x3 AB #2A((0.050437 1.0072675 0.25835297)
(1.703179 -0.53816134 0.09240111)
(-0.39267328 0.8698013 -0.55995613))>
In the same way, WaffeTensor can restore scalar object.
REPL:
CL-WAFFE> (setq x (const 1.0)) : WaffeTensor
#Const(1.0 :dtype SINGLE-FLOAT :backward NIL)
CL-WAFFE> (data x) ; single-float
1.0
That is, one of the main roles of WaffeTensor is to be a wrapper for multiple data structures.
You may well feel it is just rebundant for waffetensor to be only a wrapper. Of course, WaffeTensor has also these roles:
To Restore Computation Nodes
Operations performed via cl-waffe, creates a comutation nodes. This can all be extended by the defnode and call macros described the defnode and call section.
Input
CL-WAFFE>
(let ((a (const 1.0))
(b (const 1.0)))
(!add a b))
Output#Const(2.0 :dtype SINGLE-FLOAT :backward <Node: ADDTENSOR{W893}>)
When gradient is not required (e.g.: predict), the macro (with-no-grad)
would be useful.
with-no-grad
(&body body)
(with-no-grad
(call (model) x))
Input
CL-WAFFE>
(with-no-grad
(let ((a (const 1.0))
(b (const 1.0)))
(!add a b)))
Output#Const(2.0 :dtype SINGLE-FLOAT :backward NIL)
To Restore Gradients
WaffeTensors which created by (parameter tensor)
macro, posses the gradients, where you can get via `(backward out)`
parameter
(tensor)
Redefining new-tensor where old-tensor is const or tensor.
The new-tensor can made grads.
Excepted usage is like:
(setq my-param (parameter (!mul 0.01 (!randn `(10 10)))))
Note that: tensor's computation node that old-tensor has, will be lost. Only tensor's data and backend will be extended.
- Input
- Tensor (as usual, defined by (const)(sysconst)(tensor))
- Output
- Tensor (as usual, defined by (tensor))
backward
(tensor)
Compute back propagation by traversing the Tensor's computation node.
The parameters of the model defined by (tensor) or to which (Parameter tensor) is applied, store the gradient in grad slot.
Note that: tensor must be the shape of `(1) or single value. Otherwise an error occurs.
In the process calculating backward, new backwards won't be created. (*no-grad* automatically becomes t)
- Input
- WaffeTensor
- Output
- NIL
REPL:
CL-WAFFE> (setq a (parameter (!randn `(3 3))))
#Parameter{((-1.07... -1.93... -0.07...)
...
(1.353... 0.451... 2.473...)) :dtype :float :shape (3 3) :backward NIL}
CL-WAFFE> (setq b (parameter (!randn `(3 3))))
#Parameter{((0.234... 0.449... -1.02...)
...
(-0.42... -1.63... -0.34...)) :dtype :float :shape (3 3) :backward NIL}
CL-WAFFE> (setq c (parameter (!randn `(3 3))))
#Parameter{((0.157... 1.040... -0.84...)
...
(1.850... -0.26... -0.24...)) :dtype :float :shape (3 3) :backward NIL}
CL-WAFFE> (setq z (!sum (!add (!mul a b) c))) ; computes z=a*b + c, and summarize it.
#Const(-0.5249139 :dtype SINGLE-FLOAT :backward <Node: SUMUPTENSOR{W903}>)
CL-WAFFE> (backward z)
NIL
CL-WAFFE> (grad a)
#<MAT 3x3 B #2A((0.026024515 0.04989684 -0.11357514)
(-0.07813747 -0.032786068 -0.11216043)
(-0.047159225 -0.18221794 -0.038357873))>
CL-WAFFE> (grad b)
#<MAT 3x3 B #2A((-0.11956648 -0.21451499 -0.008029957)
(0.14240001 0.11439725 0.002615907)
(0.15042241 0.050139852 0.27483448))>
CL-WAFFE> (grad c)
#<MAT 3x3 BF #2A((0.11111111 0.11111111 0.11111111)
(0.11111111 0.11111111 0.11111111)
(0.11111111 0.11111111 0.11111111))>
with-verbose
(&body body)
(backward out) called inside of (with-verbose &body body) macro, will display how the computation nodes are traced. It would be helpful for debugging.
To distinguish What Tensor Requires Gradients
WaffeTensor that requires gradients, are represented by (parameter tensor)
, on the other hand, don't requires one are (const)
. Then, Computational nodes that have no parameters at the destination of back propagation do not need to keep a copy for gradient creation during forward propagation or to perform back propagation in the first place. WaffeTensor determines this dynamically during forward propagation.
To Store Lazy-Evaluated Object
You may notice that: some operators, like !transpose, creates lazy-evaluated tensor when get started with cl-waffe.
REPL:
CL-WAFFE> (!transpose (!randn `(3 1)))
#Const(<Transposed Tensor> :shape (1 3) :backward <Node: TRANSPOSETENSOR{W906}>)
They behaves as if they're normal tensor (In fact, !shape !dims etc... works as usual), but aren't evaluated until (value tensor) is called.
REPL:
CL-WAFFE> (setq transpose (!transpose (!randn `(3 1))))
#Const(<Transposed Tensor> :shape (1 3) :backward <Node: TRANSPOSETENSOR{W907}>)
CL-WAFFE> (value transpose)
#<MAT 1x3 B #2A((-2.362661 -1.4510747 -0.88706297))>
CL-WAFFE> transpose
#Const(((-2.36... -1.45... -0.88...)) :dtype :float :shape (1 3) :backward <Node: TRANSPOSETENSOR{W907}>)
This property helps to reduce the cost of !transpose before !matmul
Parameter and Const
There are two types of WaffeTensor, parameter and constant. The parameter creates gradient when (backward out) is called, on the other hand, the constant doesn't.
Initialize Constants
cl-waffe provides various ways to initialize constants. For example, `!randn` initializes the new tensor of the given dims with sampling the standard distribution, where var=0.0, stdev=1.0. !beta samples the beta distribution with the given alpha and beta.
REPL:
CL-WAFFE> (!randn `(10 10))
#Const(((-1.20... 0.160... ~ -0.68... 1.776...)
...
(0.137... 0.582... ~ 1.254... 0.590...)) :dtype :float :shape (10 10) :backward NIL)
CL-WAFFE> (!beta `(10 10) 2.0 1.0)
#Const(((0.787... 0.993... ~ 0.601... 0.962...)
...
(0.980... 0.505... ~ 0.553... 0.657...)) :dtype :float :shape (10 10) :backward NIL)
WaffeTensors we obtain from standard initializing methods are Constant. In general, cl-waffe provides the constructor (const value). The given value is coerced to properly types. In this example, we obtain mgl-mat from simple-array.
REPL:
CL-WAFFE> (const (make-array `(3 3)))
#Const(((0.0 0.0 0.0)
...
(0.0 0.0 0.0)) :dtype :float :shape (3 3) :backward NIL)
Initialize Parameter
Parameters are initialized via the macro (parameter tensor), which makes the given tensor parameter.
REPL:
CL-WAFFE> (parameter (!randn `(10 10)))
#Parameter{((-0.41... 0.890... ~ 1.851... -0.73...)
...
(-1.29... -1.27... ~ -1.20... -2.28...)) :dtype :float :shape (10 10) :backward NIL}
Parameter vs Constant
Excepted Usage of them is:
- Constant
- Datasets, the temporary result of calculations, Parameter which is not necessary to be optimized.
- Parameter
- Trainable Variables, to be optimized by optimizers defined by defoptimizer.
defnode and call
defnode
(name initializer-arguments &key parameters (disassemble-forward nil) forward-declaim forward (disassemble-backward nil) backward-declaim backward (document An node, defined by cl-waffe.))
Defines computation nodes in a format that cl-waffe can handle.
Note: the data structures that can be used in arguments, and returned values, must be following:
- WaffeTensor
- 1D list which each element is WaffeTensor
Be aware that you can't use (values x y ...).
- name
- The node's name. constructor and structure are being defined named after this argument.
- initializer-argument
- arguments the constructor have.
- parameter
- The parameters this node has being initializer with initializer-argument.
- disassemble-forward
- when t, when this node is compiled, display the disassemble of forward slot.
- forward-declaim
- Describe the declaim for the forward function. Note that the first argument is a structure. and :forward keyword in this declaim will be replaced by the forward function's name.
- forward
- the definition of forward
- disassemble-backward
- when t, when this node is compiled, display the disassemble of backward slot.
- backward-declaim
- Describe the declaim for the backward function. Note that the first argument is a structure. and :backward keyword in this declaim will be replaced by the backward function's name.
- backward
- the definition of backward
The macros defnode and call server as a key component of cl-waffe. In designing deep learning models, incorporating object-oriented programming can lead to more consice descriptions. Although Common Lisp has a powerful framework: CLOS and Closer-MOP, but I think its computational speed strongly depends on what common lisp implementation to use. (e.g.: SBCL/Clozure CL...) Thus, by using only defstruct and defun for defining the computation nodes and wrapping them with macros, (defnode) and (call), I have reduced the overhead associated with the process. This example shows how to define ScalarAdd Node.
Input
CL-WAFFE>
(defnode ScalarAdd ()
:disassemble-forward t
:forward-declaim (declaim (ftype (function (ScalarAdd waffetensor waffetensor) waffetensor) :forward))
:forward ((x y)
(let ((x (data x))
(y (data y)))
(declare (type single-float x y))
(const (+ x y))))
:disassemble-backward t
:backward-declaim (declaim (type (function (ScalarAdd waffetensor) list) :backward))
:backward ((dy)(list dy dy)))
OutputNIL
Through this macro, these structures and functions are defined:
- The structure, ScalarAdd
- The constructor function, (ScalarAdd)
- The function, (call-scalaradd-forward-mgl self x y) where self is a strucure ScalarAdd
- The function, (call-scalaradd-backward-mgl self dy) where self is a structure ScalarAdd.
Setting :disassemble-forward or :disassemble-backward t, prints the disassemble of :forward/:backward (only essential parts) respectively. From the result below, it seems to be optimized enough...
; disassembly for #:|nodedebug9718|
; Size: 148 bytes. Origin: #x540A110F ; #:|nodedebug9718|
; 0F: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 13: 488945F8 MOV [RBP-8], RAX
; 17: 4883EC10 SUB RSP, 16
; 1B: 488B55F0 MOV RDX, [RBP-16]
; 1F: B902000000 MOV ECX, 2
; 24: 48892C24 MOV [RSP], RBP
; 28: 488BEC MOV RBP, RSP
; 2B: B802AC3650 MOV EAX, #x5036AC02 ; #<FDEFN DATA>
; 30: FFD0 CALL RAX
; 32: 480F42E3 CMOVB RSP, RBX
; 36: 4C8BC2 MOV R8, RDX
; 39: 4C8945E0 MOV [RBP-32], R8
; 3D: 4883EC10 SUB RSP, 16
; 41: 488B55E8 MOV RDX, [RBP-24]
; 45: B902000000 MOV ECX, 2
; 4A: 48892C24 MOV [RSP], RBP
; 4E: 488BEC MOV RBP, RSP
; 51: B802AC3650 MOV EAX, #x5036AC02 ; #<FDEFN DATA>
; 56: FFD0 CALL RAX
; 58: 480F42E3 CMOVB RSP, RBX
; 5C: 4C8B45E0 MOV R8, [RBP-32]
; 60: 4180F819 CMP R8B, 25
; 64: 7538 JNE L1
; 66: 66490F6ED0 MOVQ XMM2, R8
; 6B: 0FC6D2FD SHUFPS XMM2, XMM2, #4r3331
; 6F: 80FA19 CMP DL, 25
; 72: 7403 JEQ L0
; 74: CC51 INT3 81 ; OBJECT-NOT-SINGLE-FLOAT-ERROR
; 76: 08 BYTE #X08 ; RDX(d)
; 77: L0: 66480F6ECA MOVQ XMM1, RDX
; 7C: 0FC6C9FD SHUFPS XMM1, XMM1, #4r3331
; 80: F30F58CA ADDSS XMM1, XMM2
; 84: 660F7ECA MOVD EDX, XMM1
; 88: 48C1E220 SHL RDX, 32
; 8C: 80CA19 OR DL, 25
; 8F: B902000000 MOV ECX, 2
; 94: FF7508 PUSH QWORD PTR [RBP+8]
; 97: B802DD3650 MOV EAX, #x5036DD02 ; #<FDEFN CONST>
; 9C: FFE0 JMP RAX
; 9E: L1: CC51 INT3 81 ; OBJECT-NOT-SINGLE-FLOAT-ERROR
; A0: 20 BYTE #X20 ; R8(d)
; A1: CC10 INT3 16 ; Invalid argument count trap
; disassembly for #:|nodedebug9739|
; Size: 84 bytes. Origin: #x541BA04C ; #:|nodedebug9739|
; 4C: 498B4510 MOV RAX, [R13+16] ; thread.binding-stack-pointer
; 50: 488945F8 MOV [RBP-8], RAX
; 54: 4D896D28 MOV [R13+40], R13 ; thread.pseudo-atomic-bits
; 58: 498B5558 MOV RDX, [R13+88] ; thread.cons-tlab
; 5C: 488D4220 LEA RAX, [RDX+32]
; 60: 493B4560 CMP RAX, [R13+96]
; 64: 772E JNBE L2
; 66: 49894558 MOV [R13+88], RAX ; thread.cons-tlab
; 6A: L0: 48893A MOV [RDX], RDI
; 6D: 48897A10 MOV [RDX+16], RDI
; 71: 48C7421817010050 MOV QWORD PTR [RDX+24], #x50000117 ; NIL
; 79: 488D4217 LEA RAX, [RDX+23]
; 7D: 48894208 MOV [RDX+8], RAX
; 81: 80CA07 OR DL, 7
; 84: 4D316D28 XOR [R13+40], R13 ; thread.pseudo-atomic-bits
; 88: 7402 JEQ L1
; 8A: CC09 INT3 9 ; pending interrupt trap
; 8C: L1: 488BE5 MOV RSP, RBP
; 8F: F8 CLC
; 90: 5D POP RBP
; 91: C3 RET
; 92: CC10 INT3 16 ; Invalid argument count trap
; 94: L2: 6A20 PUSH 32
; 96: FF142528050050 CALL [#x50000528] ; #x52A005B0: LIST-ALLOC-TRAMP
; 9D: 5A POP RDX
; 9E: EBCA JMP L0
call
(model &rest inputs &aux (features (model-inlineable-p model)))
Nodes which defined by this macro, works as if CLOS class, and they can have :parameters. However, what makes defnode distinct from them is that:
REPL:
CL-WAFFE> (time (call (ScalarAdd)(const 1.0)(const 1.0)))
#Const(2.0 :dtype SINGLE-FLOAT :backward <Node: SCALARADD{W924}>)
CL-WAFFE> (time (+ 1.0 1.0))
2.0
Evaluation took:
0.000 seconds of real time
0.000005 seconds of total run time (0.000005 user, 0.000000 system)
100.00% CPU
11,084 processor cycles
0 bytes consed
Evaluation took:
0.000 seconds of real time
0.000001 seconds of total run time (0.000000 user, 0.000001 system)
100.00% CPU
422 processor cycles
0 bytes consed
Nodes called by the macro (call)
are fully inlined, (like CL's inline-generic-function, static-dispatch). Considering ScalarAdd builds computation node in addition to summing up the arguments, these overheads are enough small. Here's how I achieve this behaviour:
REPL:
CL-WAFFE> (macroexpand `(call (ScalarAdd)(const 1.0)(const 1.0)))
(LOCALLY
(DECLARE (OPTIMIZE (SPEED 3)(SAFETY 1))
(INLINE call-scalaradd-forward-mgl))
(call-scalaradd-forward-mgl (SCALARADD)(CONST 1.0)(CONST 1.0)))
The function call-forward-scalaradd-mgl seems to be inlined. This is because (call)
can detect the type of node in the compile time. This leads one of the key propeties, easy to optimise. The functions via defnode and call are optimized like:
Input
CL-WAFFE>
(defun sadd (x y)
(declare (optimize (speed 3)(safety 0))
(type single-float x y))
(call (ScalarAdd)(const x)(const y)))
OutputSADD
(disassemble #'sadd)
; disassembly for SADD
; Size: 943 bytes. Origin: #x541AFCAE ; SADD
; AFCAE: 488975F0 MOV [RBP-16], RSI
; AFCB2: 4883EC10 SUB RSP, 16
.
.
(Omitted)
We got a large disassembled codes which means: all processes including building computation nodes parts, are correctly inlined. Anyway, the optimization of sadd function is properly working!. Note that the case when the type of given nodes aren't determined in compile time, call behaviours the different from this.
Input
CL-WAFFE>
(let ((node (ScalarAdd)))
(macroexpand `(call node (const 1.0)(const 1.0))))
Output(LET* ((MODEL NODE)(INPUTS (LIST (CONST 1.0)(CONST 1.0))))
(IF (TYPEP MODEL 'MODEL-LIST)
(PROGN
(SETQ MODEL (NTH (DATA (CAR INPUTS))(MODEL-LIST-MLIST MODEL)))
(SETQ INPUTS (CDR INPUTS))
(ASSERT (NOT (TYPEP MODEL 'MODEL-LIST)) NIL
cl-waffe.call: Assertion failed because model-list can't posses model-list as a element.)))
(LOCALLY
(DECLARE (OPTIMIZE (SPEED 3))
(MAYBE-INLINE CALL-INLINED-FORWARD))
(APPLY #'CALL-INLINED-FORWARD MODEL INPUTS)))
The expanded equation was slightly more complicated. Anyway, the most important part is (APPLY #'CALL-INLINED-FORWARD MODEL INPUTS)
. In short, call-inlined-forward is like:
(defun call-inlined-forwrd (model &rest inputs)
(typecase model
(addtensor (call-addtensor-forward-mgl ...))
(scalaradd (call-scalaradd-forward-mgl ...))
(T ; ... If this is first trying, Redefine call-inline-forward and try again
)))
It may be misleading but simultaneously the most simple example. Of course they're inlined. And call-inlined-forward are automatically redefined when:
- The new backend is defined.
- The node you specified doesn't match any nodes.
That is, No need to pay attention to when they are inlined.
Input
CL-WAFFE>(let ((node (ScalarAdd)))
(time (call node (const 1.0)(const 1.0))))
Output#Const(2.0 :dtype SINGLE-FLOAT :backward <Node: SCALARADD{W926}>)
Evaluation took:
0.000 seconds of real time
0.000005 seconds of total run time (0.000005 user, 0.000000 system)
100.00% CPU
10,502 processor cycles
0 bytes consed
It works the same as the first example, the overhead is enough small. (P.S.: I was told that it is impossible for SBCL to optimize a CASE of several thousand lines. The assumption is that the more nodes defined in cl-waffe, the less performance we got. In my own benchmarks, I felt it was doing well enough on the second call, but if it is slow, I know how to make it faster.)
By the way, defnode's forward slot can require &rest arguments. However, (call)
is a macro, so that we can't use apply. Is there no way to call it with &rest arguments? No, get-forward-caller
and get-backward-caller
is available to get the function object itself. In cl-waffe's implementation, !concatenate requires an &rest arguments.
get-forward-caller
(model)
get-backward-caller
(model)
(defun !concatenate (axis &rest tensors)
(declare (optimize (speed 3))
(type fixnum axis))
(let* ((node (ConcatenateTensorNode axis))
(caller (get-forward-caller node)))
(apply caller node tensors)))
Writing Node Extensions
You may notice that the functions generated by defnode has the suffix, mgl. This indicates the backend cl-waffe uses. (mgl = mgl-mat).
If the existing implementation of nodes aren't suitable for your usage, replace them. and cl-waffe provides the ecosystem to manage these additional implementation, I call it backend. For example, you can replace my broadcasting implementation with another fast implementation method. Let's create a double-float version of AddScalar.
Input
CL-WAFFE>
(define-node-extension ScalarAdd
:backend :double-float
:forward-declaim (declaim (ftype (function (ScalarAdd waffetensor waffetensor) waffetensor) :forward))
:forward ((x y)
(let ((x (data x))
(y (data y)))
(declare (type double-float x y))
(const (+ x y))))
:backward-declaim (declaim (type (function (ScalarAdd waffetensor) list) :backward))
:backward ((dy)(list dy dy)))
OutputNIL
And receive this:
[INFO] Inlining call-forward... Total Features: 64
To disable this, set cl-waffe:*ignore-inlining-info* t
[INFO] Inlining call-backward... Total Features: 64
To disable this, set cl-waffe:*ignore-inlining-info* t
It's all done. The backends you defined can be switched via (with-backend backend-name &body body) macro. Let's check how call expands it.
with-backend
(backend &body body)
Switches a backend.
See also: define-node-extension
REPL:
CL-WAFFE>
(with-backend :double-float
(macroexpand `(call (ScalarAdd)(const 1.0d0)(const 1.0d0))))
(LOCALLY
(DECLARE (OPTIMIZE (SPEED 3)(SAFETY 1))
(INLINE call-scalaradd-forward-double-float
call-scalaradd-forward-mgl))
(CASE *DEFAULT-BACKEND*
(DOUBLE-FLOAT
(call-scalaradd-forward-double-float (SCALARADD)(CONST 1.0d0)
(CONST 1.0d0)))
(MGL (call-scalaradd-forward-mgl (SCALARADD)(CONST 1.0d0)(CONST 1.0d0)))
(T (call-scalaradd-forward-mgl (SCALARADD)(CONST 1.0d0)(CONST 1.0d0)))))
There's an additional case generated, depending on *default-backend*.
REPL:
CL-WAFFE>
(with-backend :double-float
(time (call (scalarAdd)(const 1.0d0)(const 1.0d0))))
#Const(2.0d0 :dtype DOUBLE-FLOAT :backward <Node: SCALARADD{W931}>)
Evaluation took:
0.000 seconds of real time
0.000005 seconds of total run time (0.000005 user, 0.000000 system)
100.00% CPU
9,814 processor cycles
0 bytes consed
Adding new backends is no pain for cl-waffe!
MNIST Example
Using features that I introduced, we can training MLP Model with MNIST Dataset. In practice, more additional features are needed to put it simply: defmodel and deftrainer.
Defines your model
REPL:
CL-WAFFE>
(defmodel MLP (activation)
:parameters ((layer1 (cl-waffe.nn:denselayer (* 28 28) 512 T activation))
(layer2 (cl-waffe.nn:denselayer 512 256 T activation))
(layer3 (cl-waffe.nn:linearlayer 256 10 T)))
:forward ((x)
(with-calling-layers x
(layer1 x)
(layer2 x)
(layer3 x))))
NIL
CL-WAFFE> (MLP :relu)
<Model: MLP{W937}(
<Model: LAYER1 -> DENSELAYER{W938} ...>
<Model: LAYER2 -> DENSELAYER{W941} ...>
<Model: LAYER3 -> LINEARLAYER{W944} ...>
)>
CL-WAFFE> (with-output-to-string (out)
(print-model (MLP :relu) out))
––– <Model MLP{W945}>
––––––– <MLP's LAYER1 = DENSELAYER{W946}>
|-ACTIVATION-|
|___RELU_____|
––––––––––– <DENSELAYER's LAYER = LINEARLAYER{W947}>
|––slot––|–––shape–––|–trainable–|
WEIGHT -> (784 512) O
BIAS -> (1 512) O
––––––– <MLP's LAYER2 = DENSELAYER{W949}>
|-ACTIVATION-|
|___RELU_____|
––––––––––– <DENSELAYER's LAYER = LINEARLAYER{W950}>
|––slot––|–––shape–––|–trainable–|
WEIGHT -> (512 256) O
BIAS -> (1 256) O
––––––– <MLP's LAYER3 = LINEARLAYER{W952}>
|––slot––|––shape–––|–trainable–|
WEIGHT -> (256 10) O
BIAS -> (1 10) O
-(+) Total Param: 0
define your trainer
REPL:
CL-WAFFE>
(deftrainer MLPTrainer (activation lr)
:model (MLP activation)
:optimizer cl-waffe.optimizers:Adam
:optimizer-args (:lr lr)
:step-model ((x y)
(zero-grad)
(let ((out (cl-waffe.nn:softmax-cross-entropy (call (model) x) y)))
(backward out)
(update)
out))
:predict ((x)(call (model) x)))
NIL
CL-WAFFE> (setq trainer (MLPTrainer :relu 1e-3))
<Trainer: MLPTRAINER()>
CL-WAFFE> (slot-value trainer 'cl-waffe::optimizer)
<Optimizer: ADAM{W965}
Param: #<GENERAL-HASH-TABLE :TEST EQL :COUNT 6 :WEAKNESS :VALUE {100EAEF273}>
LR : 0.001
Param: #<HASH-TABLE :TEST EQL :COUNT 0 {100EAEF363}>
Param: #<HASH-TABLE :TEST EQL :COUNT 0 {100EAEF403}>
N : 0
EPSILON : 1.0e-7
BETA1 : 0.9
BETA2 : 0.999
[Total Param]: 535818
>
(This section is still under progress. However, here's a MLP model which can achive 98% valid_accuracy.) fnn.lisp If you have cloned the cl-waffe's repository, Lakefile would be available:
$ lake example:install # Install training dataset
$ lake example:mnist # Start training. (batch-size=100)