---
title: "On-Demand Inference"
description: "Synchronize model inference with fresh device data using on-demand reads and writes"
source_url: https://ai-ops.com/docs/models/on-demand-inference
---

# On-Demand Inference

By default, devices and models run on **independent scan cycles**. The model reads whatever value is currently in the cache, which may be anywhere from 0 to `scan_rate` seconds old.

**On-demand inference** synchronizes the model's cycle with its devices. Before running inference, Koios requests fresh reads from all input devices and waits for the data. After inference, it triggers immediate writes to all output devices.

---

## The Timing Problem

When device and model scan rates are similar (e.g. both 30s), the model may consistently predict on stale data depending on when each service started:

```text
Device reads:   ┃           ┃           ┃
                t=0         t=30        t=60

Model infers:      ┃           ┃
                   t=12        t=42
                   ▲           ▲
              Data is 12s   Data is 12s old
```

The offset is unpredictable and can be anywhere from 0 to a full scan cycle.

With on-demand enabled, the model **triggers** a fresh read before each inference and writes outputs immediately after:

```text
Model wakes → Request read → Device polled → Infer on fresh data → Write outputs
```

---

## The On-Demand Cycle

1. **Model wakes up** at its configured scan rate
2. **Request fresh reads** from all devices with bound input tags
3. **Wait** until all input values are newer than the start of this cycle, or until the timeout expires
4. **Inference** — read fresh values, run model, produce predictions
5. **Request writes** — push prediction values to output devices immediately

If the timeout expires before fresh data arrives, the model **fails the scan** rather than predicting on stale data.

---

## When to Use On-Demand

| Use On-Demand | Skip On-Demand |
|---------------|----------------|
| Device scan rate similar to model scan rate | Device scans 10x+ faster than model (cache always fresh) |
| Model writes control outputs to PLCs | Model only produces dashboard/alert predictions |
| Freshness directly affects prediction quality | Data changes slowly relative to scan rates |
| Multiple models share a device (see [Scan Groups](https://ai-ops.com/docs/models/scan-groups.md)) | — |

**Rule of thumb:** If your device scan rate is 10x faster than your model scan rate, you don't need on-demand. If the rates are similar, or if the model writes control outputs, on-demand is strongly recommended.

> [!TIP] On-demand lets you slow down device polling
> With on-demand, the device only needs to be polled when a model needs data. You can increase the device's scan rate to reduce load — the model triggers reads on its own schedule.

---

## Configuration

On-demand involves settings on **both the model and the device**.

### Model Settings

Found on the model's **Configuration** tab under **Advanced Configuration**.

| Setting | Description | Default | Range |
|---------|-------------|---------|-------|
| **On-Demand** | Enable on-demand inference | Off | — |
| **On-Demand Timeout** | Max wait for a fresh device read before failing | 3s | 0.5–30s |

Start with 3 seconds and increase if you see timeout errors. Devices on slow networks may need 10–15 seconds.

### Device Settings

Found on the device's **Configuration** tab under **Advanced Configuration**.

| Setting | Description | Default |
|---------|-------------|---------|
| **On-Demand Freshness** | Max age of cached data before a fresh read is required | 0s |
| **On-Demand Batch Window** | Time to wait before executing, batching concurrent requests | 0s |

For detailed explanations of each setting, see [On-Demand Scanning](https://ai-ops.com/docs/devices/on-demand-scanning.md#settings-reference).

---

## Scan Groups and On-Demand

When multiple models share devices, individual on-demand requests can multiply network I/O. A **scan group** solves this by running models together and combining all reads into a single request per device.

See [Scan Groups](https://ai-ops.com/docs/models/scan-groups.md) for details.

---

## Troubleshooting

| Problem | Cause | Solution |
|---------|-------|----------|
| Model fails with timeout | Device slow or offline | Increase timeout; check device status |
| On-demand reads seem slow | High batch window on device | Reduce batch window if only one model uses the device |
| Data still seems stale | On-demand not enabled, or freshness too high | Verify on-demand is on; reduce device freshness setting |