On-Demand Inference

By default, devices and models run on independent scan cycles. The model reads whatever value is currently in the cache, which may be anywhere from 0 to scan_rate seconds old.

On-demand inference synchronizes the model's cycle with its devices. Before running inference, Koios requests fresh reads from all input devices and waits for the data. After inference, it triggers immediate writes to all output devices.

The Timing Problem

When device and model scan rates are similar (e.g. both 30s), the model may consistently predict on stale data depending on when each service started:

Device reads:   ┃           ┃           ┃
                t=0         t=30        t=60

Model infers:      ┃           ┃
                   t=12        t=42
                   ▲           ▲
              Data is 12s   Data is 12s old

The offset is unpredictable and can be anywhere from 0 to a full scan cycle.

With on-demand enabled, the model triggers a fresh read before each inference and writes outputs immediately after:

Model wakes → Request read → Device polled → Infer on fresh data → Write outputs

The On-Demand Cycle

Model wakes up at its configured scan rate
Request fresh reads from all devices with bound input tags
Wait until all input values are newer than the start of this cycle, or until the timeout expires
Inference: read fresh values, run model, produce predictions
Request writes: push prediction values to output devices immediately

If the timeout expires before fresh data arrives, the model fails the scan rather than predicting on stale data.

When to Use On-Demand

Use On-Demand	Skip On-Demand
Device scan rate similar to model scan rate	Device scans 10x+ faster than model (cache always fresh)
Model writes control outputs to PLCs	Model only produces dashboard/alert predictions
Freshness directly affects prediction quality	Data changes slowly relative to scan rates
Multiple models share a device (see Scan Groups)	—

Rule of thumb: If your device scan rate is 10x faster than your model scan rate, you don't need on-demand. If the rates are similar, or if the model writes control outputs, on-demand is strongly recommended.

On-demand lets you slow down device polling

With on-demand, the device only needs to be polled when a model needs data. You can increase the device's scan rate to reduce load. The model triggers reads on its own schedule.

Configuration

On-demand involves settings on both the model and the device.

Model Settings

Found on the model's Configuration tab under Advanced Configuration.

Setting	Description	Default	Range
On-Demand	Enable on-demand inference	Off	—
On-Demand Timeout	Max wait for a fresh device read before failing	3s	0.5–30s

Start with 3 seconds and increase if you see timeout errors. Devices on slow networks may need 10–15 seconds.

Device Settings

Found on the device's Configuration tab under Advanced Configuration.

Setting	Description	Default
On-Demand Freshness	Max age of cached data before a fresh read is required	0s
On-Demand Batch Window	Time to wait before executing, batching concurrent requests	0s

For detailed explanations of each setting, see On-Demand Scanning.

Scan Groups and On-Demand

When multiple models share devices, individual on-demand requests can multiply network I/O. A scan group solves this by running models together and combining all reads into a single request per device.

See Scan Groups for details.

Troubleshooting

Problem	Cause	Solution
Model fails with timeout	Device slow or offline	Increase timeout; check device status
On-demand reads seem slow	High batch window on device	Reduce batch window if only one model uses the device
Data still seems stale	On-demand not enabled, or freshness too high	Verify on-demand is on; reduce device freshness setting

Configuring a Model Scan Groups