---
title: "System Health"
description: "Monitor CPU, memory, network, and disk usage with configurable alarms"
source_url: https://ai-ops.com/docs/system/performance
---

# System Health

Navigate to **System > Health** to monitor hardware resource usage in real time. The page has four tabs — **CPU**, **Memory**, **Network**, and **Disk** — each with a live chart, alarm configuration, and display options.

A red dot on a tab indicates that the metric's alarm is currently active.

---

## Common Controls

All four tabs share the same chart controls:

| Control | Description |
|---------|-------------|
| **Time range** | Select from 15 minutes, 1 hour, 6 hours, 24 hours, or 7 days |
| **Auto-scroll** | When on, the chart continuously scrolls to show the latest data with a "NOW" marker. Turning it off freezes the view for manual exploration. |
| **Zoom & pan** | Scroll to zoom the time axis; click and drag to pan. Double-click to reset. |
| **Fullscreen** | Expand the chart to fill the entire screen. Press Escape to exit. |
| **Legend** | Click a dataset label in the legend to show or hide it. Some datasets (like CPU Total) are locked and always visible. |

The polling interval adapts to the selected time range — shorter ranges poll more frequently (every 5 seconds for 15 minutes) while longer ranges poll less often (every 2 minutes for 7 days).

---

## CPU

Shows overall CPU usage as a percentage (0–100%).

### Additional Datasets

- **CPU Cores** — individual usage per logical core. Toggle with the **Show CPU Cores** setting.
- **Service Processes** — per-service CPU breakdown (Web App, Data Collector, Predict Engine, etc.). Toggle individual services in the chart legend.

### Settings

| Setting | Description |
|---------|-------------|
| **CPU Alarm Limit** | Percentage threshold that triggers the CPU alarm |
| **Show Setpoint** | Display the alarm threshold as a dashed red line on the chart |
| **Show CPU Cores** | Overlay per-core usage lines |

---

## Memory

Shows memory usage as a percentage of total system RAM.

### Additional Datasets

- **Service Processes** — per-service memory usage. Toggle individual services in the chart legend.

### Settings

| Setting | Description |
|---------|-------------|
| **RAM Alarm Limit** | Percentage threshold that triggers the memory alarm |
| **Show Setpoint** | Display the alarm threshold as a dashed red line on the chart |
| **Show in Bytes** | Switch the Y-axis from percentage to absolute values (GB/MB) |

---

## Network

Shows upload and download throughput in MB/s for a selected network interface.

### Interface Selector

A dropdown lets you choose which network interface to monitor. Each option shows the interface name, status indicator (teal for up, gray for down), IP address, and link speed.

> [!NOTE] Select an interface first
> The network chart requires a selected interface. If none is selected, the chart shows a prompt to choose one from the dropdown.

### Settings

| Setting | Description |
|---------|-------------|
| **Network Alarm Limit** | Throughput threshold (MB/s) that triggers the network alarm |
| **Show Setpoint** | Display the alarm threshold as a dashed red line on the chart |
| **Alarm on Packet Drops** | When enabled, triggers the alarm if incoming or outgoing packet drops are detected |

---

## Disk

Shows disk usage as a percentage of total capacity.

The live info display includes the current usage percentage, free space remaining, daily change rate, and a projected time-to-full estimate based on recent trends.

### Settings

| Setting | Description |
|---------|-------------|
| **Storage Alarm Limit** | Percentage threshold that triggers the disk alarm |
| **Show Setpoint** | Display the alarm threshold as a dashed red line on the chart |

---

## Alarm Behavior

Each metric has an independently configurable alarm threshold. When the current value exceeds the threshold:

- A **red dot** appears on the corresponding tab
- The **setpoint line** on the chart highlights where the threshold is
- The alarm state is checked every 10 seconds

Alarm thresholds save automatically when changed — no save button is needed.

> [!TIP] Set meaningful thresholds
> A CPU alarm at 90% catches sustained overload without false positives from brief spikes. A storage alarm at 85% gives you time to act before the disk fills. Adjust based on your deployment's normal operating range.

---

## What's Next

- [Services](https://ai-ops.com/docs/system/services.md) — view individual service status, restart services
- [Network Diagnostics](https://ai-ops.com/docs/system/network.md) — ping, traceroute, and port checks
- [Data Retention](https://ai-ops.com/docs/system/retention.md) — manage storage growth with retention policies
