Low-latency game streaming when the game is maxing out your GPU
The hardest case for any streaming host is the one that matters most: a demanding game already pinning the GPU. Here's why the encoder ends up waiting — and the two levers that took our encode time from ~30 ms to ~4 ms.
Run a streaming benchmark on an idle desktop and almost anything looks fast. The case that actually matters is the hard one: you're streaming a demanding game that's already pushing your GPU to 100%. That's where most setups fall apart — frame rate collapses and latency spikes, even though the GPU has a dedicated video encoder sitting right there.
The encoder isn't the bottleneck — the queue in front of it is
It's tempting to blame the encoder. But on a modern GPU, NVENC (or AMD and Intel's equivalent) is a separate hardware block that's barely busy. The real problem is scheduling: our capture-and-encode work has to take a turn on the GPU, and when a game is saturating it, that turn keeps getting pushed back. The encoder sits idle, waiting for a context switch that the game keeps winning. Measured on our RTX 4090 test host, encode time under a heavy game ballooned to roughly 30 ms — while the encoder itself did almost nothing.
Lever 1: ask for real-time GPU priority
The first fix is to stop being a polite background task. punktfunk requests real-time GPU scheduling priority for its capture-and-encode context, so it preempts the game's work instead of queuing behind it, and the encoder gets its turn promptly. On the same host, that alone cut encode time from about 35 ms to about 15 ms — with no visible cost to the game.
Lever 2: keep the frame queue one deep
The second fix is about freshness. If the virtual display buffers several frames before the encoder picks one up, every one of those frames is already stale by the time it reaches you. punktfunk keeps that queue just one frame deep: always grab the newest frame, never a backlog. On its own this does little while the encoder is stuck waiting — but once real-time priority shortens that wait, a shallow queue is a big latency win. In Doom: The Dark Ages it took our end-to-end encode path from about 17.5 ms to about 4.4 ms, and lifted unique (non-repeated) frames from 85 to 129 per second.
The combined result
Stacked together on the RTX 4090 host, under a genuinely hostile load — Counter-Strike 2 and Doom: The Dark Ages both hammering the GPU — the same stream went from roughly 30 ms and 40-50 fps to about 4 ms of encode time and 130-148 unique frames per second. The game still runs; the stream just stops losing the fight for the GPU.
One more detail: feed the encoder the format it wants
There's a related trap: handing the encoder full RGB frames forces a color-space conversion onto the very GPU cores the game is saturating. Feeding the video engine NV12 (or P010 for HDR) instead keeps that work off the shaders. It isn't the headline latency lever, but on a contended GPU every bit of avoided shader work helps.
Why this matters
The whole point of game streaming is to play demanding games, which means the host is always under load. A streaming host has to be a good citizen on a busy GPU and still hold its place — and getting that right is the difference between "fine on the desktop" and "still 4 ms under a maxed-out game." For the bigger picture, see how punktfunk compares to Sunshine + Moonlight.