classifier

module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 22, 2025 License: Apache-2.0

README

Neurlang Classifier

Neurlang Binary Classifier

Neurlang Classifier is a lightweight ML library for binary and quaternary neural networks that train quickly on CPUs. It models neurons with simple integer-to-boolean filters, enabling networks to be trained purely with integer arithmetic—no backpropagation, no floating-point math, and no GPU required. This makes training fast on multi-core CPUs while keeping dependencies minimal.

The framework has been proven in production, including training large-scale transformers for the goruut phonemizer. Use cases include virus detection, handwritten digit recognition, phoneme modeling, speech command classification, and more.

Features

  • No backpropagation or perceptrons: Uses simple integer-based logic instead of weight gradients, enabling an alternative ML paradigm
  • CPU-optimized, hardware-light: Requires no GPU—training is fast on multi-core CPUs using bitwise and integer operations
  • Quaternary neurons: Implements custom layers (convolution, attention, pooling, parity, etc.) that operate on boolean/integer data
  • Tiny dependencies: Written in pure Go with minimal external libraries, simplifying installation and portability
  • Hash-based models: Resulting models are extremely fast for inference using hash-based feature extraction
  • Proven at scale: Already used in production projects to train large-scale transformers

Getting Started

Prerequisites
  • Go 1.18 or higher
Installation
go get github.com/neurlang/classifier

Usage Examples

Training MNIST Digit Classifier
cd cmd/train_mnist
go run .
Running Inference on MNIST
cd cmd/infer_mnist
go run .
Training Virus Detection Classifier
cd cmd/train_is_virus
go run .
Other Examples

The cmd/ directory contains additional demo programs:

  • train_is_alnum / infer_is_alnum - Alphanumeric character classification
  • train_speak - Speech command recognition
  • train_squareroot / infer_squareroot - Mathematical function learning
  • train_phonemizer_multi / train_phonemizer_ulevel - Grapheme-to-phoneme conversion

Run ./cmd/trainall.sh to train all examples or ./cmd/runall.sh to run all inference demos.

Package Overview

  • cmd - Demo programs with train_* and infer_* commands for various tasks
  • datasets - Core dataset interface and implementations:
    • isalnum - Alphanumeric character dataset
    • isvirus - TLSH file hash signatures for virus detection
    • mnist - Standard MNIST handwritten digits (60k train / 10k test)
    • phonemizer_multi / phonemizer_ulevel - Grapheme-to-phoneme datasets
    • speak - Speech commands dataset
    • squareroot - Synthetic dataset for numeric relations
    • stringhash - String hashing and classification
  • hash - Fast modular hash function implementation used by Neurlang layers
  • hashtron - Core "hashtron" classifier model implementing the neuron logic
  • layer - Abstract interfaces and implementations:
    • conv2d - 2D binary convolutional layer
    • crossattention - Cross-attention layer for transformer-like models
    • full - Fully connected (dense) layer
    • majpool2d - 2D majority pooling layer
    • parity - Parity (XOR-like) layer
    • sochastic - Stochastic/randomly connected layer
    • sum - Element-wise sum layer
  • net - Network architecture definitions:
    • feedforward - Feedforward network architecture
  • parallel - Concurrency utilities (ForEach, LoopUntil) to speed up training
  • trainer - High-level training orchestration managing training loops over datasets

Implementing a Dataset

To implement a dataset, define a slice of samples where each sample has these methods:

type Sample interface {
    Feature(int) uint32  // Returns the feature at the specified index
    Parity() uint16      // Returns parity for dataset balancing (0 if balanced)
    Output() uint16      // Returns the output label/prediction
}

Implementing a Network

Example network with majority pooling layers:

const fanout1 = 3
const fanout2 = 5
const fanout3 = 3
const fanout4 = 5

var net feedforward.FeedforwardNetwork
net.NewLayerP(fanout1*fanout2*fanout3*fanout4, 0, 1<<fanout4)
net.NewCombiner(majpool2d.MustNew(fanout1*fanout2*fanout4, 1, fanout3, 1, fanout4, 1, 1))
net.NewLayerP(fanout1*fanout2, 0, 1<<fanout2)
net.NewCombiner(majpool2d.MustNew(fanout2, 1, fanout1, 1, fanout2, 1, 1))
net.NewLayer(1, 0)
  • fanout1 and fanout3 define majority pooling dimensions
  • fanout2 and fanout4 define the number of hashtrons
  • The final layer contains one hashtron for predictions (0 or 1 means 1 bit predicted, up to 16 bits supported)
Training and Inference

Training uses the trainer package with custom evaluation and training functions:

import "github.com/neurlang/classifier/trainer"
import "github.com/neurlang/classifier/parallel"

// Define training function
trainWorst := trainer.NewTrainWorstFunc(net, nil, nil, nil,
    func(worst []int, tally datasets.AnyTally) {
        parallel.ForEach(len(dataslice), 1000, func(i int) {
            var sample = dataslice[i]
            net.AnyTally(&sample, worst, tally, customErrorFunc)
        })
    })

// Define evaluation function
evaluate := trainer.NewEvaluateFunc(net, len(dataslice), 99, &improved_success_rate, dstmodel,
    func(length int, h trainer.EvaluateFuncHasher) int {
        // Evaluate accuracy on dataset
        return successRate
    })

// Run training loop
trainer.NewLoopFunc(net, &improved_success_rate, 100, evaluate, trainWorst)()

Inference is straightforward:

predicted := net.Infer2(&sample)  // Returns predicted output

Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

License

Neurlang Classifier is licensed under Apache 2.0 or Public Domain, at your option.

Directories

Path Synopsis
cmd
Package cmd contains various demo programs
Package cmd contains various demo programs
infer_is_alnum command
Package main provides a demo program for running inference with a trained alphanumeric character classifier.
Package main provides a demo program for running inference with a trained alphanumeric character classifier.
infer_is_virus command
Package main provides a demo program for running inference with a trained virus detection classifier.
Package main provides a demo program for running inference with a trained virus detection classifier.
infer_mnist command
Package main provides a demo program for running inference with a trained MNIST digit classifier.
Package main provides a demo program for running inference with a trained MNIST digit classifier.
infer_squareroot command
Package main provides a demo program for running inference with a trained square root approximation network.
Package main provides a demo program for running inference with a trained square root approximation network.
train_is_alnum command
Package main provides a demo program for training an alphanumeric character classifier.
Package main provides a demo program for training an alphanumeric character classifier.
train_is_virus command
Package main provides a demo program for training a virus detection classifier using TLSH file hashes.
Package main provides a demo program for training a virus detection classifier using TLSH file hashes.
train_mnist command
Package main provides a demo program for training a handwritten digit classifier on the MNIST dataset.
Package main provides a demo program for training a handwritten digit classifier on the MNIST dataset.
train_phonemizer_multi command
Package main provides a demo program for training a multi-language grapheme-to-phoneme (G2P) converter.
Package main provides a demo program for training a multi-language grapheme-to-phoneme (G2P) converter.
train_phonemizer_ulevel command
Package main provides a demo program for training an utterance-level grapheme-to-phoneme (G2P) converter.
Package main provides a demo program for training an utterance-level grapheme-to-phoneme (G2P) converter.
train_speak command
Package main provides a demo program for training a speech commands classifier.
Package main provides a demo program for training a speech commands classifier.
train_squareroot command
Package main provides a demo program for training a square root approximation network.
Package main provides a demo program for training a square root approximation network.
Package datasets implements the Neurlang dataset type
Package datasets implements the Neurlang dataset type
isalnum
Package isalnum implements the IsAlnum Dataset
Package isalnum implements the IsAlnum Dataset
isvirus
Package isvirus contains TLSH hashes of viruses and clean files dataset for machine learning (without leading "T1" characters)
Package isvirus contains TLSH hashes of viruses and clean files dataset for machine learning (without leading "T1" characters)
mnist
Package MNIST is the 60000 + 10000 handwritten digits dataset
Package MNIST is the 60000 + 10000 handwritten digits dataset
phonemizer_multi
Package phonemizer_multi provides datasets for grapheme-to-phoneme (G2P) conversion across multiple languages.
Package phonemizer_multi provides datasets for grapheme-to-phoneme (G2P) conversion across multiple languages.
phonemizer_ulevel
Package phonemizer_ulevel provides utterance-level datasets for grapheme-to-phoneme (G2P) conversion.
Package phonemizer_ulevel provides utterance-level datasets for grapheme-to-phoneme (G2P) conversion.
speak
Package speak provides a speech commands dataset for audio and speech classification tasks.
Package speak provides a speech commands dataset for audio and speech classification tasks.
squareroot
Package squareroot provides a synthetic dataset for learning to compute square roots and other numeric relations.
Package squareroot provides a synthetic dataset for learning to compute square roots and other numeric relations.
stringhash
Package stringhash provides a dataset for tasks involving string hashing and classification by hash values.
Package stringhash provides a dataset for tasks involving string hashing and classification by hash values.
Package Hash implements the fast modular hash used by the Neurlang classifier
Package Hash implements the fast modular hash used by the Neurlang classifier
Package Hashtron implements a hashtron (classifier)
Package Hashtron implements a hashtron (classifier)
Package layer defines a custom combiner and layer interface
Package layer defines a custom combiner and layer interface
conv2d
Package conv2d implements a 2D bit-convolution layer and combiner
Package conv2d implements a 2D bit-convolution layer and combiner
crossattention
Package crossattention implements a cross attetion connected layer and combiner
Package crossattention implements a cross attetion connected layer and combiner
full
Package full implements a fully connected layer and combiner
Package full implements a fully connected layer and combiner
majpool2d
Package majpool2d implements a 2D majority pooling layer and combiner
Package majpool2d implements a 2D majority pooling layer and combiner
parity
Package parity implements a parity layer and combiner
Package parity implements a parity layer and combiner
sochastic
Package sochastic implements a sochastic connected layer and combiner
Package sochastic implements a sochastic connected layer and combiner
sum
Package sum implements a sum layer and combiner
Package sum implements a sum layer and combiner
net
Package net implements various hashtron network types
Package net implements various hashtron network types
feedforward
Package feedforward implements a feedforward network type
Package feedforward implements a feedforward network type
package parallel contains parallel LoopUntil() and parallel ForEach() plus other concurrency primitives.
package parallel contains parallel LoopUntil() and parallel ForEach() plus other concurrency primitives.
Package trainer provides high-level training orchestration for Neurlang networks.
Package trainer provides high-level training orchestration for Neurlang networks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL