Committed to daNeuralNet a first working version of a JIT for matrix-vector multiplication that relies on the FMA instruction set (Fused Multiply and Addition).
This version generates code that is up to twice faster than the OpenBLAS for matrix sizes up to CPU cache size (100×100 to 200×200 usually), and maintains a marginal lead for larger sizes, though those are bound by memory bandwidth. The performance profile is similar on both AMD and Intel CPUs.
A test version of SamplingProfiler 64bit is available here (3.2 MB).
It has only been tested with 64bit binaries compiled by Delphi 10.3 and detailed map files. It should work with other Delphi version, (TD32 and other debug information formats have not been tested yet).
There other known issues with stack traces from DLLs, so it is rough around the edges but should be functional.
Just create a new repository with a “LibCBLAS” unit meant to use the OpenBLAS library in its Windows 64bit incarnation from Delphi 10.3+
OpenBLAS is an optimized BLAS library (Basic Linear Algebra Subprograms), the DLL itself can be obtained from the “xianyi” repository where pre-compiled Windows DLL are maintained.
I recently dusted off an artificial neural network project, now published at https://bitbucket.org/egrange/daneuralnet/. This is a subject I’ve been dabbling on and off since the days of 8 bit CPUs.
The goals of the project are twofold: first experiment with neural networks that would be practical to run and train on current CPUs, and second experiment with JIT compilation of neural networks maths with Delphi.
TensorFlow and Python are cool, but they feel a bit too much like Minecraft, another sandbox of ready-made blocks 😉
Update: should now be available, BitBucket staff was very responsive, and Sophos AV as well. Two false positives remain by VBA32 and Cylance, which do not appear to have proper mechanisms to report false positive (not very professional for an AV vendor IMHO).