promtop

command module

v0.0.2 Latest Latest Go to latest Published: Jan 25, 2026 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jondoveston/promtop

Links

Open Source Insights

README ¶

promtop

A WIP terminal dashboard app that reads system metrics from Prometheus

Core usage calculation

The Core Concept

The node_cpu_seconds_total{mode="idle"} metric is a cumulative counter that tracks how many seconds a CPU core has spent in idle state since boot.

How Rate of Idle Time Works

Key insight: In real time, time passes at exactly 1 second per second. A CPU core can either spend that time doing work OR being idle.

If a core is completely idle:

For every 1 second of real time, the idle counter increases by 1 second
Rate of idle time = 1.0 seconds/second (100% of time is idle)
CPU usage = 100 - (1.0 * 100) = 0%

If a core is completely busy:

For every 1 second of real time, the idle counter increases by 0 seconds (it's doing work)
Rate of idle time = 0.0 seconds/second (0% of time is idle)
CPU usage = 100 - (0.0 * 100) = 100%

If a core is 50% busy:

For every 1 second of real time, the core spends 0.5s working and 0.5s idle
The idle counter increases by 0.5 seconds
Rate of idle time = 0.5 seconds/second (50% of time is idle)
CPU usage = 100 - (0.5 * 100) = 50%

In the Code

Prometheus version:

100 - (avg by (instance,cpu) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

rate(...[1m]) calculates: (idle_now - idle_1min_ago) / 60 seconds
This gives seconds of idle per second of real time
Multiply by 100 to get percentage idle
Subtract from 100 to get percentage busy

node_exporter version:

rates[cpuName] = 100 - 100*(last-first)/interval

(last-first) = total idle seconds accumulated over the measurement window
interval = total real-time seconds in the measurement window
(last-first)/interval = fraction of time spent idle
100 - 100*fraction_idle = percentage of time spent NOT idle (i.e., busy)

Example Calculation

Over a 60-second window:

Scenario: Core is 75% busy
Idle counter increases from 1000.0s to 1015.0s (gained 15 seconds of idle time)
Real time elapsed: 60 seconds
Rate: 15 / 60 = 0.25 seconds idle per second
CPU usage: 100 - (0.25 * 100) = 75% busy

The calculation works because idle time + busy time must equal real time. If we know the idle fraction, we can deduce the busy fraction.

Sampling interval

Compromises of Using Long Intervals (60 seconds)

1. Slower Response to Changes

Problem: CPU spikes or drops take up to 60 seconds to fully reflect in the display
Example: If a process suddenly uses 100% CPU, you won't see the full impact for a full minute
Impact: Makes the dashboard less useful for real-time monitoring of sudden events

2. Averaging Out Short Bursts

Problem: Brief high-CPU events get diluted across the 60-second window
Example:
- A core runs at 100% for 6 seconds, then 0% for 54 seconds
- 60-second window shows: 10% average usage
- 1-second window would show: 100%, 100%, 100%, 100%, 100%, 100%, 0%, 0%, ...
Impact: You completely miss short-duration CPU spikes that might be important

3. Delayed Initial Reading

Problem: Need to wait the full window duration before getting any data
Prometheus: Needs 60 seconds of data history before rate(...[1m]) returns meaningful results
node_exporter code: Currently requires 2 readings, but rate accuracy improves as it approaches 60 readings
Impact: 60+ second delay before dashboard shows anything useful after startup

4. Poor for Bursty Workloads

Problem: Many workloads are bursty (e.g., web servers, batch jobs, cron tasks)
Example: A batch job runs every 5 minutes for 30 seconds
- Short window: Clearly shows 30-second bursts
- 60-second window: Smears it out, making patterns harder to see
Impact: Harder to identify patterns or correlate with application behavior

5. Latency in Troubleshooting

Problem: When investigating a performance issue, 60-second lag makes it hard to correlate cause and effect
Example: You run a command and want to see its CPU impact immediately
Impact: By the time you see the CPU change, you've forgotten what you did

The Trade-off

Longer windows (60s):

✅ Smoother, more stable graphs
✅ Less noise from brief fluctuations
✅ Better for capacity planning and trend analysis
❌ Hides short-term spikes
❌ Slow to respond to changes
❌ Poor for real-time troubleshooting

Shorter windows (1-5s):

✅ Immediate response to changes
✅ Shows all spikes and bursts
✅ Better for real-time monitoring
❌ Noisier data
❌ Harder to see long-term trends
❌ More susceptible to measurement artifacts

Common Practice

Most monitoring tools offer configurable windows or use different windows for different purposes:

htop/top: ~1-3 seconds (real-time monitoring)
Grafana dashboards: Often 1-5 minutes (trend analysis)
Alerting: 5-15 minutes (avoid alert fatigue from brief spikes)

Your current 60-second window is quite long for an interactive TUI. A 5-15 second window might be a better balance for promtop.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
internal

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL