Docs
/
Models
/

On-Demand Inference

On-Demand Inference

By default, devices and models run on independent scan cycles. The model reads whatever value is currently in the cache, which may be anywhere from 0 to scan_rate seconds old.

On-demand inference synchronizes the model's cycle with its devices. Before running inference, Koios requests fresh reads from all input devices and waits for the data. After inference, it triggers immediate writes to all output devices.

The Timing Problem

When device and model scan rates are similar (e.g. both 30s), the model may consistently predict on stale data depending on when each service started:

Device reads:   ┃           ┃           ┃
                t=0         t=30        t=60

Model infers:      ┃           ┃
                   t=12        t=42
                   ▲           ▲
              Data is 12s   Data is 12s old

The offset is unpredictable and can be anywhere from 0 to a full scan cycle.

With on-demand enabled, the model triggers a fresh read before each inference and writes outputs immediately after:

Model wakes → Request read → Device polled → Infer on fresh data → Write outputs

The On-Demand Cycle

  1. Model wakes up at its configured scan rate
  2. Request fresh reads from all devices with bound input tags
  3. Wait until all input values are newer than the start of this cycle, or until the timeout expires
  4. Inference — read fresh values, run model, produce predictions
  5. Request writes — push prediction values to output devices immediately

If the timeout expires before fresh data arrives, the model fails the scan rather than predicting on stale data.

When to Use On-Demand

Use On-DemandSkip On-Demand
Device scan rate similar to model scan rateDevice scans 10x+ faster than model (cache always fresh)
Model writes control outputs to PLCsModel only produces dashboard/alert predictions
Freshness directly affects prediction qualityData changes slowly relative to scan rates
Multiple models share a device (see Scan Groups)

Rule of thumb: If your device scan rate is 10x faster than your model scan rate, you don't need on-demand. If the rates are similar, or if the model writes control outputs, on-demand is strongly recommended.

Configuration

On-demand involves settings on both the model and the device.

Model Settings

Found on the model's Configuration tab under Advanced Configuration.

SettingDescriptionDefaultRange
On-DemandEnable on-demand inferenceOff
On-Demand TimeoutMax wait for a fresh device read before failing3s0.5–30s

Start with 3 seconds and increase if you see timeout errors. Devices on slow networks may need 10–15 seconds.

Device Settings

Found on the device's Configuration tab under Advanced Configuration.

SettingDescriptionDefault
On-Demand FreshnessMax age of cached data before a fresh read is required0s
On-Demand Batch WindowTime to wait before executing, batching concurrent requests0s

For detailed explanations of each setting, see On-Demand Scanning.

Scan Groups and On-Demand

When multiple models share devices, individual on-demand requests can multiply network I/O. A scan group solves this by running models together and combining all reads into a single request per device.

See Scan Groups for details.

Troubleshooting

ProblemCauseSolution
Model fails with timeoutDevice slow or offlineIncrease timeout; check device status
On-demand reads seem slowHigh batch window on deviceReduce batch window if only one model uses the device
Data still seems staleOn-demand not enabled, or freshness too highVerify on-demand is on; reduce device freshness setting