AI Models

An AI model in Koios runs machine learning inference on live data from your devices. Models read input values from tags, feed them through a pre-trained neural network (ONNX or TensorFlow Lite), and write predictions back to output tags, all in real time, at a configurable scan rate.

Koios runs models — it does not train them

Koios is an inference engine. You train models externally using your own tools and data, then upload the exported file to Koios for deployment. See Training a Model for details on this workflow.

How Models Work

On every scan, a model follows this cycle:

Collect: read current and recent historical values from all input tags
Calibrate: apply per-binding gain and bias for sensor drift or unit corrections (no-op at defaults)
Normalize: scale each input to the range the model was trained on
Infer: run the model file to produce predictions
Denormalize: scale predictions back to real-world units
Write: apply inverse calibration and push predictions to output tags

This repeats at the model's scan rate (e.g. every 1s, 5s, or 30s). Models can also run in on-demand mode for synchronized device reads/writes, or be grouped in a scan group for batched execution.

Key Components

1. Model File

The trained model file (ONNX or TFLite) contains the neural network weights and structure. You can upload multiple versions and switch between them without reconfiguring bindings.

Format	Extension	Description
ONNX	`.onnx`	Open Neural Network Exchange, exported from PyTorch, scikit-learn, etc.
TFLite	`.tflite`	TensorFlow Lite, optimized for edge deployment

2. Bindings

Bindings connect the model's inputs and outputs to tags:

Input bindings read values from tags and pass them to the model
Output bindings receive predictions from the model and optionally write them to tags

Every input binding must be assigned to a tag. Output bindings can be left unassigned.

3. Normalization

Each binding has a normalization setting that controls how values are scaled. The system uses two independent settings:

Normalization type: the mathematical formula:

Type	Formula	Output Range
None	Passthrough	Raw values
Min-Max	`(v - min) / (max - min)`	[0, 1]
Symmetric	`2*(v - min)/(max - min) - 1`	[-1, 1]
Z-Score	`(v - mean) / std`	Unbounded

Normalization source: where the parameters come from:

Source	Description
Tag Range	Uses the tag's configured `range_min` / `range_max`
Custom	Uses custom values set directly on the binding

Z-Score always forces Custom source (tags don't have meaningful mean/std values).

Match your training data

The normalization in Koios should match whatever normalization was used during training. If you trained with min-max scaling using the tag's range, use Tag Range. If you used custom bounds or z-score, use Custom and enter the same values.

4. Calibration

Each binding can apply a linear gain-and-bias transform on top of the raw value. This is useful for sensor drift, engineering unit conversion, or fine-tuning a model's response without retraining. Defaults are identity (gain 1.0, bias 0.0), so existing bindings see no change. See Calibration (Gain & Bias) for the full pipeline.

5. Configuration

Setting	Description	Default
Output Application	Absolute writes the prediction directly; Relative adds the predicted delta to the current tag value	Absolute
Output Mode	Continuous maps each output neuron 1:1 to an output binding; Discrete selects from an action map via argmax	Continuous
Scan Rate	How often inference runs (seconds)	1s
Sample Rate	Interval between historical samples in the input tensor (seconds)	1s
On-Demand	Request fresh device reads before inference, writes after	Off
On-Demand Timeout	Max wait for fresh reads (seconds)	3s
Memory Only	Store history in process memory instead of the time-series database	Off
Scan Group	Assign to a group for synchronized execution	None

Memory Only Mode

When enabled, the model stores its input history in an in-memory buffer instead of querying the time-series database. This eliminates the database read/write round-trip on every cycle, enabling ultra-low-latency inference for fast control loops.

How it works: The predict engine maintains a rolling buffer per input tag, appending one sample per scan cycle from the live data cache. The model reads from this buffer instead of the database.

Trade-offs:

Aspect	Standard	Memory Only
Input history source	Time-series database	In-memory buffer
Execution metrics	Full (charts, missed scans)	Avg cycle duration only
Data on restart	Persisted	Lost, model warms up from zero
Latency	Database query per cycle	Near-zero (cache + memory)

Requirements:

Requires On-Demand to be enabled (the predict engine must actively pull fresh reads)
Cannot be in a Scan Group

Warmup: After a restart, the buffer is empty. The model shows a "Memory buffer warming up" message until it has collected enough samples (determined by input_depth * sample_rate). During warmup, inference does not run.

When to use memory only

Use this for fast control loops (10ms–500ms scan rates) where database latency is the bottleneck. For most models with scan rates above 1 second, standard mode is fine.

Consider your network latency first

Memory only eliminates the database round-trip, but the on-demand device read still goes over the network. If your device reads take more than 50–100ms (common with remote OPC-UA servers or Modbus devices over WAN), the network round-trip will dominate your cycle time regardless. In that case, standard mode with on-demand is likely sufficient. The database overhead is negligible compared to the device read. Memory only shines when device reads are fast (local PLCs, sub-10ms response) and the database query is the actual bottleneck.

Scan Groups

A scan group runs multiple models together on a shared schedule. When on-demand is enabled, all member models' reads and writes are combined into a single network request per device, reducing I/O on slow networks.

See Scan Groups for details.

Input Depth and Historical Data

Models typically need a window of historical data, defined by:

Input depth: number of historical samples (read from the model file)
Sample rate: time interval between each sample

For example, input_depth=10 at sample_rate=0.5s needs the last 5 seconds of data.

Initialize history for new models

When you first enable a model, use the Initialize History action on the overview page to backfill the required data so the model can start immediately.

Model Status

Status	Meaning
Running	Actively making predictions at its scan rate
Stopped	Disabled or not started
Failed	Error during inference; check error code and message

Each binding also has its own status. A binding can fail if the bound tag is disabled, there isn't enough historical data, or the value is outside the normalization range.

Model Lifecycle

Create: name, output application, output mode, scan rate
Upload a model file: ONNX or TFLite on the Files tab
Assign bindings: map inputs and outputs to tags on the Bindings tab
Initialize history: backfill data if input tags are new
Enable: activate real-time inference

Input tags must be running

Make sure input tags and their parent devices are enabled and running before enabling the model.

What You See on a Model

Model List

Table with status, name, input/output counts, and timestamps. Supports filtering, search, and bulk actions (enable, disable, delete, export).

Model Detail

Tab	Content
Overview	Live status, last prediction, scan progress, active file info, tensor chart, recent events
Files	Upload, manage, and switch model file versions
Bindings	Configure input/output bindings, normalization, failure detection
Configuration	Name, description, output settings, scan rate, sample rate, advanced settings
Execution	Cycle timing charts and performance metrics
Logs	Real-time predict engine log viewer
Parameters	Read-only table of all model fields
Cross References	Tags, components, and other entities referencing this model

What's Next

Training a Model: what to prepare before deploying a model
Model Inference Requirements: tensor shapes and data preparation
Creating a Model: step-by-step guide
Managing Model Files: uploading and versioning
Assigning Bindings: mapping inputs and outputs to tags
Configuring a Model: all configuration settings explained
On-Demand Inference: synchronize inference with fresh device data
Scan Groups: grouped execution with shared on-demand
Monitoring a Model: live values, diagnostics, and execution performance

Dashboard Training a Model