---
title: "AI Models"
description: "Overview of AI models and real-time inference in Koios"
source_url: https://ai-ops.com/docs/models/introduction
---

# AI Models

An **AI model** in Koios runs machine learning inference on live data from your devices. Models read input values from tags, feed them through a pre-trained neural network (ONNX or TensorFlow Lite), and write predictions back to output tags — all in real time, at a configurable scan rate.

> [!NOTE] Koios runs models — it does not train them
> Koios is an inference engine. You train models externally using your own tools and data, then upload the exported file to Koios for deployment. See [Training a Model](https://ai-ops.com/docs/models/training-a-model.md) for details on this workflow.

## How Models Work

On every scan, a model follows this cycle:

1. **Collect** — read current and recent historical values from all input tags
2. **Calibrate** — apply per-binding gain and bias for sensor drift or unit corrections (no-op at defaults)
3. **Normalize** — scale each input to the range the model was trained on
4. **Infer** — run the model file to produce predictions
5. **Denormalize** — scale predictions back to real-world units
6. **Write** — apply inverse calibration and push predictions to output tags

This repeats at the model's **scan rate** (e.g. every 1s, 5s, or 30s). Models can also run in [on-demand mode](https://ai-ops.com/docs/models/on-demand-inference.md) for synchronized device reads/writes, or be grouped in a [scan group](https://ai-ops.com/docs/models/scan-groups.md) for batched execution.

## Key Components

### 1. Model File

The trained model file (ONNX or TFLite) contains the neural network weights and structure. You can upload multiple versions and switch between them without reconfiguring bindings.

| Format | Extension | Description |
|--------|-----------|-------------|
| **ONNX** | `.onnx` | Open Neural Network Exchange — exported from PyTorch, scikit-learn, etc. |
| **TFLite** | `.tflite` | TensorFlow Lite — optimized for edge deployment |

### 2. Bindings

Bindings connect the model's inputs and outputs to tags:

- **Input bindings** read values from tags and pass them to the model
- **Output bindings** receive predictions from the model and optionally write them to tags

Every input binding must be assigned to a tag. Output bindings can be left unassigned.

### 3. Normalization

Each binding has a normalization setting that controls how values are scaled. The system uses two independent settings:

**Normalization type** — the mathematical formula:

| Type | Formula | Output Range |
|------|---------|-------------|
| **None** | Passthrough | Raw values |
| **Min-Max** | `(v - min) / (max - min)` | [0, 1] |
| **Symmetric** | `2*(v - min)/(max - min) - 1` | [-1, 1] |
| **Z-Score** | `(v - mean) / std` | Unbounded |

**Normalization source** — where the parameters come from:

| Source | Description |
|--------|-------------|
| **Tag Range** | Uses the tag's configured `range_min` / `range_max` |
| **Custom** | Uses custom values set directly on the binding |

Z-Score always forces Custom source (tags don't have meaningful mean/std values).

> [!TIP] Match your training data
> The normalization in Koios should match whatever normalization was used during training. If you trained with min-max scaling using the tag's range, use Tag Range. If you used custom bounds or z-score, use Custom and enter the same values.

### 4. Calibration

Each binding can apply a linear gain-and-bias transform on top of the raw value — useful for sensor drift, engineering unit conversion, or fine-tuning a model's response without retraining. Defaults are identity (gain `1.0`, bias `0.0`), so existing bindings see no change. See [Calibration (Gain & Bias)](https://ai-ops.com/docs/models/assigning-bindings.md#calibration-gain--bias) for the full pipeline.

### 5. Configuration

| Setting | Description | Default |
|---------|-------------|---------|
| **Output Application** | **Absolute** writes the prediction directly; **Relative** adds the predicted delta to the current tag value | Absolute |
| **Output Mode** | **Continuous** maps each output neuron 1:1 to an output binding; **Discrete** selects from an action map via argmax | Continuous |
| **Scan Rate** | How often inference runs (seconds) | 1s |
| **Sample Rate** | Interval between historical samples in the input tensor (seconds) | 1s |
| **On-Demand** | Request fresh device reads before inference, writes after | Off |
| **On-Demand Timeout** | Max wait for fresh reads (seconds) | 3s |
| **Memory Only** | Store history in process memory instead of the time-series database | Off |
| **Scan Group** | Assign to a group for synchronized execution | None |

---

## Memory Only Mode

When enabled, the model stores its input history in an **in-memory buffer** instead of querying the time-series database. This eliminates the database read/write round-trip on every cycle, enabling ultra-low-latency inference for fast control loops.

**How it works:** The predict engine maintains a rolling buffer per input tag, appending one sample per scan cycle from the live data cache. The model reads from this buffer instead of the database.

**Trade-offs:**

| Aspect | Standard | Memory Only |
|--------|----------|-------------|
| Input history source | Time-series database | In-memory buffer |
| Execution metrics | Full (charts, missed scans) | Avg cycle duration only |
| Data on restart | Persisted | Lost — model warms up from zero |
| Latency | Database query per cycle | Near-zero (cache + memory) |

**Requirements:**
- Requires **On-Demand** to be enabled (the predict engine must actively pull fresh reads)
- Cannot be in a **Scan Group**

**Warmup:** After a restart, the buffer is empty. The model shows a "Memory buffer warming up" message until it has collected enough samples (determined by `input_depth * sample_rate`). During warmup, inference does not run.

> [!NOTE] When to use memory only
> Use this for fast control loops (10ms–500ms scan rates) where database latency is the bottleneck. For most models with scan rates above 1 second, standard mode is fine.

> [!WARNING] Consider your network latency first
> Memory only eliminates the database round-trip, but the on-demand device read still goes over the network. If your device reads take more than 50–100ms (common with remote OPC-UA servers or Modbus devices over WAN), the network round-trip will dominate your cycle time regardless. In that case, standard mode with on-demand is likely sufficient — the database overhead is negligible compared to the device read. Memory only shines when device reads are fast (local PLCs, sub-10ms response) and the database query is the actual bottleneck.

---

## Scan Groups

A **scan group** runs multiple models together on a shared schedule. When on-demand is enabled, all member models' reads and writes are combined into a single network request per device — reducing I/O on slow networks.

See [Scan Groups](https://ai-ops.com/docs/models/scan-groups.md) for details.

---

## Input Depth and Historical Data

Models typically need a **window of historical data**, defined by:

- **Input depth** — number of historical samples (read from the model file)
- **Sample rate** — time interval between each sample

For example, input_depth=10 at sample_rate=0.5s needs the last 5 seconds of data.

> [!NOTE] Initialize history for new models
> When you first enable a model, use the **Initialize History** action on the overview page to backfill the required data so the model can start immediately.

---

## Model Status

| Status | Meaning |
|--------|---------|
| **Running** | Actively making predictions at its scan rate |
| **Stopped** | Disabled or not started |
| **Failed** | Error during inference — check error code and message |

Each **binding** also has its own status. A binding can fail if the bound tag is disabled, there isn't enough historical data, or the value is outside the normalization range.

---

## Model Lifecycle

1. **Create** — name, output application, output mode, scan rate
2. **Upload a model file** — ONNX or TFLite on the Files tab
3. **Assign bindings** — map inputs and outputs to tags on the Bindings tab
4. **Initialize history** — backfill data if input tags are new
5. **Enable** — activate real-time inference

> [!TIP] Input tags must be running
> Make sure input tags and their parent devices are enabled and running before enabling the model.

---

## What You See on a Model

### Model List

Table with status, name, input/output counts, and timestamps. Supports filtering, search, and bulk actions (enable, disable, delete, export).

### Model Detail

| Tab | Content |
|-----|---------|
| **Overview** | Live status, last prediction, scan progress, active file info, tensor chart, recent events |
| **Files** | Upload, manage, and switch model file versions |
| **Bindings** | Configure input/output bindings, normalization, failure detection |
| **Configuration** | Name, description, output settings, scan rate, sample rate, advanced settings |
| **Execution** | Cycle timing charts and performance metrics |
| **Parameters** | Read-only table of all model fields |
| **Logs** | Real-time predict engine log viewer |
| **Cross References** | Tags, components, and other entities referencing this model |

---

## What's Next

- [Training a Model](https://ai-ops.com/docs/models/training-a-model.md) — what to prepare before deploying a model
- [Model Inference Requirements](https://ai-ops.com/docs/models/inference-requirements.md) — tensor shapes and data preparation
- [Creating a Model](https://ai-ops.com/docs/models/creating-a-model.md) — step-by-step guide
- [Managing Model Files](https://ai-ops.com/docs/models/model-files.md) — uploading and versioning
- [Assigning Bindings](https://ai-ops.com/docs/models/assigning-bindings.md) — mapping inputs and outputs to tags
- [Configuring a Model](https://ai-ops.com/docs/models/configuring-a-model.md) — all configuration settings explained
- [On-Demand Inference](https://ai-ops.com/docs/models/on-demand-inference.md) — synchronize inference with fresh device data
- [Scan Groups](https://ai-ops.com/docs/models/scan-groups.md) — grouped execution with shared on-demand
- [Monitoring a Model](https://ai-ops.com/docs/models/enabling-a-model.md) — live values, diagnostics, and execution performance
